Closed

es78

13 Posts

0

2622

March 29th, 2023 11:00

Networker Backup jobs doesn't working

Hi everyone,

This is my first post here, and, I'm a newbie in Networker Backup. In our environment, we have several Backup Jobs, and some of them have been failing since last sunday, when we had a power outage. Look at the following log below:

03/29/23 15:00:00 Step (1 of 5): nsrjobd has made a request to start this savegrp with PID-11546.
03/29/23 15:00:00 Action backup traditional 'backup' has initialized as 'backup action job' with job id 4256470
03/29/23 15:00:00 Step (2 of 5): Querying the group or policy for the configured group of clients with the savegrp PID-11546.
03/29/23 15:00:00 FileServer01.abc.corp:E:\Files_Corp requested level=incr
03/29/23 15:00:00 FileServer01.abc.corp:F:\Files_Corp requested level=incr
03/29/23 15:00:00 Step (3 of 5): The group or policy information has been successfully returned.
03/29/23 15:00:00 Action backup traditional will run up to 100 jobs in parallel
03/29/23 15:00:00 Step (4 of 5): Creating a savefs job for all the configured clients.
03/29/23 15:00:00 Creating a 'savefs' job on the host 'FileServer01.abc.corp'.
03/29/23 15:00:00 Policy 'FILESERVER_F_FS01_INTERMED', workflow 'POLICY_3PM', action 'backup', group 'FILESERVER_FS01_INTERMED_POLICY_3PM'.
03/29/23 15:00:00 Starting action backup traditional 'backup', which has 1 clients.
03/29/23 15:00:00 Starting a session on the host 'FileServer01.abc.corp' to execute the job 'FileServer01.abc.corp:savefs', which scans the file system to determine the files for backup.
03/29/23 15:00:00 The job 'FileServer01.abc.corp:savefs' has been started on the client 'FileServer01.abc.corp'.
03/29/23 15:00:00 FileServer01.abc.corp:savefs started
03/29/23 15:00:00 savefs -s networkerbkp.abc.corp -c FileServer01.abc.corp -g FILESERVER_FS01_INTERMED_POLICY_3PM -p -l full -R -v -F "E:\\Files_Corp" "F:\\Files_Corp"
03/29/23 15:00:05 Group FILESERVER_FS01_INTERMED_POLICY_3PM waiting for 1 jobs (0 awaiting restart) to complete.
03/29/23 15:00:06 FileServer01.abc.corp:E:\Files_Corp
03/29/23 15:00:06 level=incr, vers=pools, p=8
03/29/23 15:00:06 FileServer01.abc.corp:F:\Files_Corp
03/29/23 15:00:06 level=incr, vers=pools, p=8
03/29/23 15:00:06 The job 'FileServer01.abc.corp:savefs' on the host 'FileServer01.abc.corp' has been completed.
03/29/23 15:00:06 FileServer01.abc.corp:savefs succeeded.
03/29/23 15:00:06 FileServer01.abc.corp:savefs The job has successfully scanned the file system on the host 'FileServer01.abc.corp'. The main save will now be started.
03/29/23 15:00:06 FILESERVER_FS01_INTERMED_POLICY_3PM:FileServer01.abc.corp:savefs See the file '/nsr/logs/policy/FILESERVER_F_FS01_INTERMED/POLICY_3PM/backup_4256470_logs/4256471.log' for command output.
03/29/23 15:00:06 Step (5 of 5): Creating a pseudo_saveset job for all the configured clients.
03/29/23 15:00:06 Creating a save job for the save set 'pseudo_saveset' on the host 'FileServer01.abc.corp'.
03/29/23 15:00:06 Parallel save streams per save set option value '-M#4' is being applied to save set 'pseudo_saveset'
03/29/23 15:00:06 Constructing the save command for the save set 'pseudo_saveset' on the host 'FileServer01.abc.corp': save -LL -s networkerbkp.abc.corp -g FILESERVER_F_FS01_INTERMED/POLICY_3PM/backup/FILESERVER_FS01_INTERMED_POLICY_3PM -a "*policy action jobid=4256470" -a "*policy name=FILESERVER_F_FS01_INTERMED" -a "*policy workflow name=POLICY_3PM" -a "*policy action name=backup" -y "Wed Apr 5 23:59:59 GMT-0300 2023" -w "Wed Apr 5 23:59:59 GMT-0300 2023" -f - -m FileServer01.abc.corp -M #4 -b FILESYSTEM_Diario -o "\"RENAMED_DIRECTORIES:index_lookup=on;REQUESTED_LEVEL:level=incr;\"" -l incr -q -W 78 -N pseudo_saveset "E:\\Files_Corp" "F:\\Files_Corp".
03/29/23 15:00:06 Executing a 'pseudo_saveset' job on the host 'FileServer01.abc.corp'. This job is an anchor save set for the workflow, and will be completed at the end of the client's backup.
03/29/23 15:00:06 FileServer01.abc.corp:pseudo_saveset started
03/29/23 15:00:06 save -LL -s networkerbkp.abc.corp -g FILESERVER_F_FS01_INTERMED/POLICY_3PM/backup/FILESERVER_FS01_INTERMED_POLICY_3PM -a "*policy action jobid=4256470" -a "*policy name=FILESERVER_F_FS01_INTERMED" -a "*policy workflow name=POLICY_3PM" -a "*policy action name=backup" -y "Wed Apr 5 23:59:59 GMT-0300 2023" -w "Wed Apr 5 23:59:59 GMT-0300 2023" -f - -m FileServer01.abc.corp -M #4 -b FILESYSTEM_Diario -o "\"RENAMED_DIRECTORIES:index_lookup=on;REQUESTED_LEVEL:level=incr;\"" -l incr -q -W 78 -N pseudo_saveset "E:\\Files_Corp" "F:\\Files_Corp"
03/29/23 15:00:57 The save job for the save set 'E:\Files_Corp' on the host 'FileServer01.abc.corp' has been completed.
03/29/23 15:00:57 Job 4256473 for client FileServer01.abc.corp exited with return code -1
03/29/23 15:00:57 Job 4256473 host: FileServer01.abc.corp savepoint: E:\Files_Corp had ERROR indication(s) at completion
03/29/23 15:00:57 FileServer01.abc.corp:E:\Files_Corp failed.
03/29/23 15:00:57 FileServer01.abc.corp:E:\Files_Corp will retry 1 more time(s).
03/29/23 15:00:57 FileServer01.abc.corp:E:\Files_Corp next retry in 1 seconds.
03/29/23 15:00:57 The save job for the save set 'F:\Files_Corp' on the host 'FileServer01.abc.corp' has been completed.
03/29/23 15:00:57 Job 4256474 for client FileServer01.abc.corp exited with return code -1
03/29/23 15:00:57 Job 4256474 host: FileServer01.abc.corp savepoint: F:\Files_Corp had ERROR indication(s) at completion
03/29/23 15:00:57 FileServer01.abc.corp:F:\Files_Corp failed.
03/29/23 15:00:57 FileServer01.abc.corp:F:\Files_Corp will retry 1 more time(s).
03/29/23 15:00:57 FileServer01.abc.corp:F:\Files_Corp next retry in 1 seconds.
03/29/23 15:00:58 The save job for the save set 'pseudo_saveset' on the host 'FileServer01.abc.corp' has been completed.
03/29/23 15:00:59 FileServer01.abc.corp:pseudo_saveset succeeded.
03/29/23 15:00:59 FileServer01.abc.corp:pseudo_saveset Save has closed the session on the host 'FileServer01.abc.corp'.
03/29/23 15:00:59 FILESERVER_FS01_INTERMED_POLICY_3PM:FileServer01.abc.corp:pseudo_saveset See the file '/nsr/logs/policy/FILESERVER_F_FS01_INTERMED/POLICY_3PM/backup_4256470_logs/4256472.log' for command output.
03/29/23 15:01:04 Action backup traditional 'backup' with job id 4256470 is exiting with status 'failed', exit code 1

Can anyone help me?

Thank you so much

Best regards

Responses(22)

C

crazyrov

4 Operator

•

1.3K Posts

0

March 30th, 2023 20:00

Thanks for the details!
The error for your backup failures is "No matching device ..........". This means that the backup device is unavailable. If you are using a Data Domain, please ensure all the devices are mounted and available. If you are using a Tape Library ensure that it is in a ready state. Let me know how it goes.

B

barry_beckers

393 Posts

0

April 17th, 2023 12:00

At least the 2nd device shows no volume mounted. so mount that one.

As you stated several backups to fail, do all of them use that same device? Are those failing all using the same pool? I assume you have multiple pools and each has its own device?

No idea how large your environment is, but ours can get very large. We have various pools and depending on the total amount of clients each pool has at least one but can also have multiple devices. Even defined for the same NW storage node, sharing even the same volume in that pool, to be able to go beyond 60 simultaneous backups sessions towards that pool. However as you use a dd6300, those only have 270 streams with ddos7.x (according to https://www.dell.com/support/kbdoc/en-us/000186032/supported-stream-counts-for-ddos-7-x?lang=en), so possibly with the amount of devices you already have, you might already hit that maximum if all devices would be maxing out at 60 sessions each, so you might already be good there (also depending on the nw server parallelism setting stating how much backup sessions are allowed to run at max, besides having max sessions also set on devices).

So did mounting that one single device now solve the issue?

As said, you can enable "auto media management" for each ddboost device so that nw will try to (re)mount a device when it is not. For example after a NW restart. Might not what you always want, but you might use it if so desired.

BTW that is a fairly old DDOS 6.0.2.20 version you have on that DD by the way, which is not even supported anymore since July 2020 according to https://www.dell.com/support/kbdoc/en-us/000185734/all-dell-emc-end-of-life-documents?lang=en ? The only 6.x being supported nowadays is 6.2 and even that has an EOSS Date of May 31st 2023?

So what is the NW version even that we are dealing with here? Also old and no longer supported? No support contract (anymore) I assume? A recent NW19.x for example does not support DDOS6.0. However not supported does not mean that it won't work. But I assume that you also would then have an older NW version?

Even less reason to envy you...

C

CrashCart

1 Rookie

•

45 Posts

0

April 17th, 2023 12:00

Well, there you go then, CrazyROV and Barry Beckers were right, looks like.

Right-click on the device, and select "Mount" (where Volume Name is blank)

C

crazyrov

4 Operator

•

1.3K Posts

0

March 30th, 2023 02:00

Hello @es78 ! Welcome to the community. Can you provide a little more information on the failure?
Is this the only client that is failing after the power outage, are other backups working fine ?
Can you checkif the server has come up after the power outage ?
Check if the NetWorker services on the client are running? You can do this by running `nsrrpcinfo -p client_hostname` from the NetWorker server.

E

es78

13 Posts

0

March 30th, 2023 04:00

Hello @crazyrov , thank you for having me here in the community!

Speaking of the problem, no, almost all of my backup are failing, less one or two policies, as you can see in the picture below:

Policies bkp.PNG

- Below the result of the command you posted (from the Networker Server):

nsrrpcinfo command.jpg

Thank you very much for your valuable assistance

Best wishes

E

es78

13 Posts

0

March 31st, 2023 08:00

Hi @crazyrov ,

Thank you once again for your valuable help! So, we use DataDomain in our environment. Apparently, there is no problem with DataDomain, there aren't any visual signals here. Is there any way (or command) we can use to try to identify if DataDomain is ok?

Best wishes

E

es78

13 Posts

0

March 31st, 2023 12:00

@crazyrov

One question, please: can I reboot DataDomain? Is there any problem if I reboot it?

C

crazyrov

4 Operator

•

1.3K Posts

0

March 31st, 2023 23:00

I am referring to the DD device that are configured in Networker. One you have logged into the NMC and launched the NetWorker administrator. Navigate to the `Devices` tab and then on the left panel click on `Devices` again. All the configured devices show be listed on the right screen. You will need to ensure that the DD devices here are mounted, an indication of this would be that the volume name for the device should be visible in the right panel.

Check this video out, it might help you understand - https://youtu.be/G_E0h4q1D4E

C

crazyrov

4 Operator

•

1.3K Posts

0

March 31st, 2023 23:00

You can reboot it but please don't. First you would need to see what the actual issue.

E

es78

13 Posts

0

April 4th, 2023 10:00

Hi @crazyrov

Thank you very much for your explanation. And, I'm sorry for my late reply, unfortunately something came up this weekend, and I couldn't answer before. Well, I am going to check your video out, and I am going to compare with my case. I will return with the result.

Thank you once again

Best wishes

B

barry_beckers

393 Posts

0

April 11th, 2023 08:00

As you say you are new to Networker, but speak about "our environment", isn't there anyone that actually knows how the environment is setup and knows how to validate functionality? Like the bare minimum to check if ddboost devices are mounted on the NW server or NW storage node(s) if you use separate systems to have backup devices mounted on?

We don't know your environment, but neither do you seem to be? So that is rather difficult to pint you into a specific direction? I mean if you are asking if the data domain can be rebooted, you should know whether or not it can be? For example if it is used by other environments as well?

We have to assume at least a bare minimum of how NW and data domain work with each other to know what to look for and what to test to see if all is up again as it should be? Like for example if the data domain is even started up after the power outage? As I expect the backup admin to actually manage and verify a DD?

So can you see the backup volumes mounted for the defined devices? Can you unmount/re-mount the ddboost backup devices? Assuming you use the DD with ddboost backup devices and not as a VTL?

Also the logs stated things like :

"See the file '/nsr/logs/policy/FILESERVER_F_FS01_INTERMED/POLICY_3PM/backup_4256470_logs/4256472.log' for command output."

Did you actually look into these log files?

E

es78

13 Posts

0

April 12th, 2023 09:00

Hi @zinco

Thank you for answering. Well, we have other analysts in our environment, but no one is a backup specialist. Actually, they barely know Networker Backup. The last backup analyst quit, and since then I am the responsible for Backup, because I know some things about this. But, as you can see, definitely I'm not a specialist, hands down. And, well, as I said above, we had an outage power, and I have this problem since then.

Anyway, I will check again these logs...but I think I have to reboot Data Domain (I have to talk to my boss about it)...I saw everything (or almost), and I didn't see any problem (not visible)...anyway, I'm lost...

Best regards.

B

barry_beckers

393 Posts

0

April 17th, 2023 04:00

No idea what a datadomain reboot would even "solve"? Unless you see something is the matter with the DD system, but that should be verified on the DD end, like whether or not nics are online, the filesystem is running, ddboost is enabled, etc, etc. So the very basics...

E

es78

13 Posts

0

April 17th, 2023 07:00

@zinco

I rebooted Data Domain, and yet the problem persist. As a matter of fact, I can't see any problem in it. How can I check if FileSystem is running, if ddboost is enabled, etc.?

Here a picture where the ddboost is enabled, for example:

And here, the NFS:

So, it's inexplicable, from my pont of view. Is there another way to check if these things are ok (filesystem, ddboost, etc.)?

B

barry_beckers

393 Posts

0

April 17th, 2023 11:00

Do you have actual access to the Dell support site, so having access to the Knowledge Base?

You have not shown what the status of the ddboost devices is within Networker? If devices are even mounted?

https://www.dell.com/support/manuals/en-us/networker/nw_p_ddboost_int_guide_19.8/volume-unavailable-error?guid=guid-dfc6b599-1ecd-4a8b-9e2a-582ae056cdfa&lang=en-us

I mean we are talking about the very minimal basic things to know and look at here? Is even difficult for me to state even where to start as I have to assume that these things are known, for which above "Dell EMC NetWorker 19.8 Data Domain Boost Integration Guide" might be out of your league already or even the NW admin guide?

Assuming the things are configured on DD end as stated on https://www.dell.com/support/kbdoc/en-us/000008226?lang=en. I see that in above screenshots ddboost status shows "enabled". However there is no info shown of the used "Storage unit", the DD mtree created for ddboost usage.

Maybe " auto media management" is set to No for all ddboost devices, so that NW does not try to automatically mount any of the NW ddboost devices?

Below via CLI would show any device that has no volume mounted on it (assuming you have a Linux based backup server?):

# nsrmm | grep -i nothing

You'd have to give us more to chew on, to even try to point into a certain direction...

1
2

View All

No Events found!