Unable to restore Exchange 2010 DB with NMM v2.3

Question

Hi Guys,

I am having trouble restoring either via RDB or Alternate DB an Exchange 2010 DB.

The error I get is the following :

NMM .. Starting Recovery. This operation may take a long time depending on the size of the data requested.

82296:nsrsnap_vss_recover:NMM .. Operation unit failed with error 'br-module: Tape Restore Failed...'.Possible cause: 1)Unsupported file system or 2)write-protected disc or 3)No space on disc or 4)Drive not found

powershell.Invoke Mount-Database failed.

Cannot invoke this function because the current host does not implement it.

81188:nsrsnap_vss_recover:

NMM .. Failed to mount Database [rdb12]. Client will not attempt any further mounts.

**************************************************************

Recover completion status

Files failed to be restored for Microsoft Exchange 2010.

Exchange restore completed for STS4.

Also, the NMM version I used is 2.3 build 109. I uninstalled the NMM and re-installed v2.3 build 113 with no improvements.

The Exchange 2010 is a DAG, which consists of two Mailbox nodes installed on Windows 2008 R2. Thw two nodes have been restarted without any luck.

Moreover, the restore needs ~250 GB and for the moment we have ~ 600GB free space in the same location.

The restore folder is d:\restore, the edb is set to restore to d:\restore\edb and the logs restore to d:\restore\logs. Additionally, I have tried restoring the logs to a different drive, still the restore failed every time, although more than a dozen attempts have been done with the different scenarios mentioned above

Could anyone provide additional assistance with this ?

Thanks,

Bogdan

CarlosRojas · Answer

Hi Bogdan,

This looks to me as a timeout, based on the sizes of the DB's (250GB aprox).

Have you set the TCP/IP timeouts values in NetWorker server and Exchange clients?:

To avoid timeouts please do as follows in NetWorker and Exchange servers:

1.- Group Inactivity Timeout is set to "0" in the Exchange client configuration in NMC.

2.- Add the TCP/IP settings as described in the article above, and if it still fails I would suggest to add the following values in Client and Server:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

On Solaris:

#ndd -get /dev/tcp tcp_keepalive_interval

7200000

#ndd -set /dev/tcp tcp_keepalive_interval 3360000

These changes will require a reboot.

Are you recovering from a full backup or from full+Incremental backups?

After a reboot please run the recovery again, but first please delete any existent RDB's and files related to the RDB.

Anyway, did you run NMM Config Checker to ensure that all pre-requisites are met?

If all pre-requisites are ok, can you please share the nmm.raw from the clietn you are starting the restore from so we can take a look?

Thank you.

Carlos.

CarlosRojas · Answer

Hi Bogdan,

Yes, NMC can show 100% as pure NetWorker operations (search in index, retrieve data from media, send data to client) is completed, but could be that NMM/Exchange tasks are not yet finished, in fact they didn't as restore failed.

TCP values should be set, at least increase the ones already present to the values sent earlier, in this case you will avoid any TCP timeout.

I noticed a very common error, and it's to use the wrong DAG name for the restore.

For backups you have set this:

-A NSR_EXCH2010_DAG=excbu01vc.star.ro

But for restore you have set this one:

nsrsnap_vss_recover: flag=A arg=NSR_EXCH2010_DAG=EXCBU01VC

So in this case you are using the upper case short name, and you should be using the lower case FQDN as configured in the backup so please do as follows:

1.- Stop NetWorker services, including Replication Manager (2) services

2.- Delete /nsr/tmp on the client

3.- Start services including Replication Manager AgentPS and leave Replication Manager Exchange Interface stopped, as it is manual and will be started up when required by NMM.

4.- Open NMM GUI

5.- Select the correct name for the DAG (lower case FQDN). If you have it in the drop down menu go to step 9

6.- If no lower case FQDN name present in the drop down menu go to Options->Configuration and click in the "Refresh button.

7.- Select the correct name from the left list and click on Add

8.- Select the correct name in the options window and click select

9.- Once you have set the correct DAG name then please run a new restore, but set some debug level, let's say level 3, and run the new restore.

If it fails again please attach the nmm.raw again.

Thank you.

Carlos.

Bogdanionut_bad · Answer

Nmm.raw has been uploaded to the first post. Thanks, Bogdan

Bogdanionut_bad · Answer

Hi Carlos,

Thanks for your quick reply.

The restore provisions the space needed on the destination drive. Also, it is shown as 100% completed in NMC. Still after the restore session ends if I go to the destination I find that the restored data has been deleted and the message I mentioned in the NMM.

Moreover, if the problem were caused by a TCP IP timeout, would it be normal to show 99%-100% at the progress status bar in NMC ?

The Group Inactivity Timeout is set to 0 for the Exchange group.

Also, two of the reg keys from the four you mentioned are present :

TcpWindowSize - 64240

keepalivetime - 300000

I am unable to modify/create reg keys for the moment because I need an approval from the Exchange Administrator.

I will upload the nmm.raw asap.

Regards,

Bogdan

Bogdanionut_bad · Answer

Hi Carlos,

Unfortunately I am unable to schedule a Mailbox node restart so that the reg keys you mentioned could be used.

I will try to schedule the restart by the end of the week.

Also, I have attached a new nmm.raw.

I really appreciate your help.

Thanks,

Bogdan

CarlosRojas · Answer

Hi Bogdan,

I couldn't find much useful information in the nmm.raw with the debug option enabled.

I would suggest you to install NMM Config Checker and run it, to ensure all pre-requisites are met,

It also installs a log gathering tool called "NMM EMC Reports"

If you could run it and attach the zip file created by NMM EMC reports that would help a lot to find out what is going on, otherwise you can check the Exchange events in event viewer, as the final step failing is the mount DB operation.

I found this KB that looks similar to yours, although it talks about Exchange 2007 CCR and NMM 2.2 SP1.

https://solutions.emc.com/emcsolutionview.asp?id=esg116592

and also this one, but I think doesn't apply, right?:

https://solutions.emc.com/emcsolutionview.asp?id=esg126442

Do you have the system path and log path configured for that DB?

Also, are you trying to restore from active DB node or passive?

Did you delete all RDB's and its folders and files before the new restore attempt?

Do the user have enough permissions over the DB's?

Definitely you need to run NMM Config Checker (available in Powerlink) and check the output (you can attach it here), but if nothing there I would suggest to run NMM EMC reports and past here the zip file so I can take a look.

Thank you.

Carlos.

Bogdanionut_bad · Answer

Carlos,

Nmm Config Checker has been run and it shows everything ok.

I have attached the EMC Reports, also the restore is done from a full backup.

All RDB folders and RDBs have been deleted after each failed restore.

The user has permissions over the dbs and is the same one with which Config Checker was ran.

Regarding the nodes, from what I understand, Exchange 2010 uses only active nodes, so they are both active.

The production DB has the logs on one drive and the edb on another, according to Microsoft recommendations.

CarlosRojas · Answer

Hi Bogdan,

in DAG environments with 2 nodes only one of them is hosting the DB in "Active" mode, the other one is hosting the passive DB, and in some cases this is important to know. Have you tried to restore the same DB from the other node?

I didn't find you answer about removing old RDB's, have you done that? I've seen in the past issues with the restore when there were previous RDB's created.

Have you checked the availability of all savesets for FULL and INCR backups?

I'll review the logs and will get back to you tomorrow.

Thank you.

Carlos.

Bogdanionut_bad · Answer

Hi Carlos,

The restore has been attempted ftom both nodes. I have already started the restore from the other node, using fqdn in lowercase as you advised yesterdat. I'll let you know how it goes.

Also, old RDBs have been removed.

As I have mentioned, this is a full backup, but I do not understand what you mean by " Have you checked the availability of all savesets for FULL and INCR backups?".

Thanks,

Bogdan

CarlosRojas · Answer

Hi Bogdan,

I would scan in the saveset that is recoverable for the APPLICATIONS saveset.

Also check the index by using nsrinfo command with the corrects flag, including the savetime and the client name.

This should show you the indexes for that client.

You should see all savesets for Application, Coverset, Writer Metadata backup metadata.

APPLICATIONS:\Microsoft Exchange 2010

2 of these, one for the DB and another one for the logs.

You should see also at least another one, most likely for the physical node name, that would be the backup metadata.

C:\Program Data......\xxx.xml

This would be the writer metadata. Without this file you won't be able to restore.

VSS:\

This is the coverset.

Scan in all the savesets for that client on that savetime.

You will need to have all those savesets to be able to restore, however I don't think the issue here is related to savesets being expired or not, but to a timeout, based on the logs.

So please, have the Exchange admins to apply the changes required in both Exchange nodes and reboot, and do the same in the NetWorker server and reboot.

Let us know how it goes.

Thank you.

Carlos

Bogdanionut_bad · Answer

Hi Carlos,

The restore is done from a saveset that has previously been marked as recyclable and for which I recovered the index.

If I go on Required Volumes in the NMM I am displayed 3 tape volumes, the ones that contain the saveset I have recovered.

However, all the savesets have the DAG cluster hostname as client name and I see that during a backup operation, some very small savesets contain the node name. However, I can not find any savesets from the time I want to make the restore that contain the DAG node hostname, only the VSS:/ and *.xml savesets with the DAG cluster hostname as client name.

I have attached a screenshot with the savesets I have for restore.

Is this sufficient ?

Bogdanionut_bad · Answer

Hi Carlos,

I have recovered the index for the recoverable saveset APPLICATIONS:\Microsoft Exchange 2010 only and successfully restored the required data. This is pretty strange to me because the NMM showed the DB as restorable even before the index recovery.

Also, *.xml saveset was not needed to be market browsable for the restore.

Still, I am curious how you are able to analyze the EMC Reports I have sent you. Do you use Notepad or some other kind of utility ?

Thanks for all your help,

Bogdan

CarlosRojas · Answer

Hi Bogdan,

So you were able to restore the data eventually?

The .xml files are always recoverable, even right after the backup you will see them as recoverable, as the retention for that saveset (metadata) is directly related to the DB backup.

To analyze the logs I use different methods:

1.- For nmm.raw just render the log so you can read it.

2.- You can also use NMM Log Viewer installed along with NMM Config Checker.

3.- Event Viewer to see Application, System and Exchange logs.

For massive logs there are some free log viewers that allow you to open logs up to 2 GB with almost not memory usage.

Thank you.

Carlos.

Bogdanionut_bad · Answer

Hi Carlos, Sorry for the delay, yes, I have been able to recover the data. Regards, Bogdan

CarlosRojas · Answer

Thank for the feedback Bogdan. Carlos.

NetWorker

Unable to restore Exchange 2010 DB with NMM v2.3

Was this post helpful?