Start a Conversation

Unsolved

V

3 Posts

402

January 19th, 2023 07:00

Hyper-v backup not reporting failures

Hello all

I have a networker server at version 19.5.0.7. After a random check of my Hyper-v backups I discovered that while the monitor shows that the backups where completing successfully they where in fact been skipped because of checkpoints that existed on the Hyper-v server. After digging a little deeper I found out that for the last 2 months I have been getting a false report from Networker and in fact do not have any backups for a lot of my systems. I had a look at the applogs from one of the servers and it shows that the Vm.s have checkpoints but this info does not show up in the main console or the job logs.

 

Has anyone seen anything similar? 

January 23rd, 2023 12:00

Not using hyper-v backups (anymore), we only have vmware image level backups.

I assume you are using RCT based backups and not VSS based?

In the NW NMM19.5 guide, you might wanna sift through notes as it states for example:

RCT backups exclude the virtual machines that contain stale checkpoints.
● RCT backups of the virtual machines that contain any user checkpoints or recovery checkpoints are excluded, and marked as failed. Before you back up such virtual machines, merge the checkpoints by running the following PowerShell command:
Get-VM -name -ComputerName |Get-VMSnapshot | Remove-
VMSnapshot

RCT backups of the virtual machines that do not contain any checkpoints proceed.

Are the snapshots in your case stale, as then they would be excluded, this as when they would have been user or recovery checkpoints would (or rather should) be skipped and marked as failed?

So what kind of checkpoints where you dealing with? Stale ones? I'd assume that NW would mention them being stale and therefor (silently) skipping these vm's altogether.

However https://www.dell.com/support/kbdoc/en-us/000182193/networker-hyper-v-rct-backup-fails-with-reason-saveset-contains-vm-with-checkpoint?lang=en seems to suggest that backups also could fail due to "stale user created or recovery checkpoints for the VM's that need to be checked and deleted".

But as in your case backups where not failing, my guess it were stale checkpoints being involved and hence were skipped?

The thing is that in some cases where NW is hardcoded (and not always even stating in manuals that that is even the case) to behave in a certain way, alas, one cannot choose to have NW report this as a failure still (like for example with filesystem backups to regard them as failed if there is a warning and not only an error). So things can go unnoticed, simply because certain configurations are not supported.

Same for vmware backups where due to whatever reason if a vm configured to use quiesced snapshots, if it cannot do that will simplyswitch to non-quiesced and report no failure or warning that one can use to trigger to regard the backups as failed. Or vm's that have IDE disks. Unsupported, hence simply ignored, not causing the workflow to fail if I recall correctly.

"works as designed". SIGH

So if you would now be aware of NW ignoring hyper-v backups for certain vm's if they have stale checkpoints, then you might have to regularly check for those from hyper-v end.

3 Posts

January 23rd, 2023 13:00

Hello Barry

 

Thanks for your feedback. True most of the checkpoints where stale, I don't know if it was because they were almost 2 months old or because as I suspect there was probably a communication problem with the remote site at the time of the last backup. When we discovered these checkpoints we were not able to delete them from the Hyper-v Manager so I doubt that the command in the technote would work, we had do do it with the powershell command and this technote says that the backup failed not that it has finished successfully. We also tried creating a checkpoint and running a backup and even though it did not back it up the backup job finished without errors. My main concern is that it does not do this with all the Hyper-v servers. On some it just skips them and on others it fails with the expected checkpoint error Either way it is a very serious problem when the backup software just skips stuff and continues as though all is normal and even though it has an error in the applog it does not report back that error, not even giving  out a warning saying that there is a problem. Anything to indicate that not all is well. There are more than 40 Hyper-v servers and more than 350 virtual machines in the infrastructure so as you can understand it will be a problem to have to run the commands every few days. I think this is a  huge bug in the software that needs to be addressed ASAP. Thankfully that we did not have to restore any of these Vm's and find ourselves without backups.

 

Again thanks for your reply 

January 24th, 2023 00:00

I completely understand your concerns.

In this case - if not done so already - I'd reach out to Dell about this? As otherwise things are possibly unlikely to be changed anyways.

It would have to become clear how NW regards those checkpoints? Based on the manuals and KB articles, i would assume NW treats them as stale checkpoints, thus excluding them from backup.

Also I'd reach out to MS to be able to query whether or nor a checkpoint is considered to be stale or not? As NW seems to be able to find a difference between stale checkpoints (and skipping those vm's) or normal checkpoints (causing backup to fail). As then you would be able to act upon finding any stale checkpoints, checking continuously for them...

You would want to be able to depend on NW to do report about those clearly when doing so, in such a way that it would become selectable to choose whether or not you would want to see the backup fail because of it, instead of skipping such vm's? Similar to the mentioned filesystem backups where one can opt to choose it to fail if and when files changed during save for example.

The more control over how NW behaves, the better I'd say...

View All

No Events found!

Top