Start a Conversation

Unsolved

Q

1 Rookie

 • 

6 Posts

18

May 15th, 2024 11:11

Networker alert - Daily save set count exceeded

Hi,

we have new alert "NetWorker recommended threshold limit is exceeded. Daily save set count recommendation is 100000. Current value is 101420. NetWorker Performance Optimization planning guide provides more information.".

I have checked the "Performance Optimization planning guide" and I haven't found any related information.

Maybe I have missed something. Does anyone have any information about this alert?

1 Rookie

 • 

117 Posts

May 17th, 2024 12:46

Networker is able to handle 100K jobs daily for some time now. Seems that now then it actually generates a warning for that? Wat is the NW server version you encountered this in?

The same goes for a NMC, so if for example you have a shared NMC and manage multiple NW servers with it, that also has a limitation of 100K jobs daily. Hence we have more than one, to be able to spread load amongst them, to remain below 100K jobs...

It is also stated in the NVE (networker virtual edition) guide, where it talks about small (up until 10K jobs daily), medium (up until 50K jobs daily) and large workloads (up until 100K jobs daily).

As is clearly stated in the nw19.9 "Performance Optimization planning guide" :


"Distribution of workflows and jobs
It is recommended that you configure 2000 jobs per workflow and a maximum of 100 workflows running concurrently with a maximum limit of 100Kjobs per day. Exceeding these limits will significantly increase the load on the NetWorker server and on the user interface (UI) response time, and will impact the reliability of the system. NetWorker can process 1024 jobs at a time and rest of the jobs are queued. It is also recommended that you do not exceed more than 6000 jobs in a queue at any fixed point in time. To prevent overloading the server, stagger the workflow start time.
With the Message Bus Architecture between NetWorker Management Console (NMC) and the NetWorker server, the NMC can sustain 100K jobs per day. The following table describes how you can configure the jobs into one or more policies and schedule them with different time intervals."

and:

"Evaluate the workload for any datazone with more than a 100K jobs per day, and consider moving some of jobs to other datazones."

The 100K limit is there for some considerable time now.

1 Rookie

 • 

1 Message

20-05-2024 05:09 AM

@bbeckers1
Hi,

thank you for the explanation. I am still confused because we have fairly small environment and we also don't have nearly as many workflows/jobs as described in guide. Also the workflow start times are shifted so it doesn't interfere with one another.

I will check this part of the guide in detail and see where the "problem" is in our environment.

We are using NW server version 19.10 VE.


1 Rookie

 • 

117 Posts

21-05-2024 19:16 PM

@qwerty94​ 

you can have a look at the counts alone for save, recover and clone jobs in total:

# printf 'show job id\np type: save job\np type: recover job\np type: clone job\n' | jobquery -s localhost -i -|grep job |wc -l

That however is for the total amount of these three job types in the NW jobsdb. By default the jobs retention is 3 days (or better said 72 hours as that is the default setting in the backup server config), so you would have to divide that number by three, to get the average job amount per day.


job types (NW 19.9):
                 Known types: active job db, backup action job,
                              bootstrap save job, clone job,
                              generic remote command, job indication,
                              notification job, post job, pre job,
                              probe action job, probe job, recover job,
                              save job, savefs job, session info, task job,
                              utility job, workflow job;

So if the large number might not come from these three job types, you might have a look at some other job types as mentioned above to see what causes this amount of jobs? You then would wanna use more than just only the job id, but for example also the name to know what kinda job it is?

https://www.dell.com/support/manuals/en-us/networker/nw_p_nwadmin/using-jobquery?guid=guid-5aa1a6f9-4804-421c-b0ef-10e31767ef4e&lang=en-us


1 Rookie

 • 

6 Posts

23-05-2024 06:32 AM

@bbeckers1​ 

I have run the command you gave me and here are the results:

save job: 46826
session info 57853

This two types of jobs are the largest (this number is already divided by 3 since I have 72 hours job retention). All other job types are below 3000 per day.

Now I see where it gets the number in the alert.

1 Rookie

 • 

117 Posts

23-05-2024 19:24 PM

@qwerty51​ makes me wonder then where that comes from if you say that you have a fairly small environment? Our largest environment is doing 50+K jobs a day, with 1000+ clients. So is there something that is running maybe way more often than you would expect? for example probe-based jobs running each minute or what, while the backup itself is running way less often (but those probes would not even count as a save job)?

You should see in the logs also if there are maybe certain workflows that are run way more often than expected?

I don't know if you redirect backup output to their own logs or rather that all ends up in the policy_notifications.log file (the default)? this should also be mentioned in the daemon.raw (*)?

(*) we runtime render the daemon.raw into a daemon.log, being roll-overed each day, so that one can easily look into the logfile without needing to do so each and every time with nsr_render_log, which we regard way too cumbersome...

https://www.dell.com/support/manuals/en-us/networker/nw_p_security_config_guide_19.9/rendering-raw-log-files-at-runtime?guid=guid-1f65e4f7-88aa-4671-a6a5-387268ce8441&lang=en-us

No Events found!

Top