Unsolved
This post is more than 5 years old
4 Posts
0
2621
Networker Process Stales
Hi All,
In my shop we have Networker 8.2.3.1 which runs on Linux Redhat RHEL 6.6 and 531 Kernel. Recently we started facing problem with the process hung issues.
Further we found the Networker processes are going into futex_wait_queue_me mode, just want to know if any one face same kind of issues ?.
Please let me know how to fix this ?..
StuartWhitby
45 Posts
0
August 4th, 2016 06:00
I assume that you've got the state of futex_wait_queue_me from strace, there should also be a further argument which actually says what it's waiting for, and that's something from another process or thread, or a timeout value to be hit.
You'd need to cross-reference the process and threads with others to identify which one's locking things up. The one I've historically seen, which should be fixed with 8.2.3, is someone on a distant system doing a large mminfo query which the single-threaded mdb took ages to respond to. However, without a decent amount of additional info here I don't think the community will be able to help. And for something like waiting for the community to respond when you're looking for info on strace outputs you'd be far better off learning how to work on this yourself - you can't afford the response delay which will invariably be to ask for more info.
bingo.1
2.4K Posts
0
August 4th, 2016 06:00
The question is which processes exactly hung? - Is this a core (nsr..) issue or a NW Admin GUI issue
Some processes (like nsrmmd) can just be killed - the server (nsrd) will restart them
The safest way would be to properly shut down all NW daemons (nsr_shutdown). If this will not help, kill the NW processes (their initiators, nsrd & nsrexecd).
To find the problem, you must be more specific - you might also consider to ask EMC support.
Vishnuaabe
4 Posts
0
August 4th, 2016 06:00
Yes the core nsr process (nsrexecd) . so again when we do a graceful shutdown of the networker services using "networker stop" and "nsr shutdown" the services goes down gracefully.
But when we bring that up using "networker start" backup runs good for 30 mins then again the nsrexecd process stales.
StuartWhitby
45 Posts
0
August 4th, 2016 08:00
If NetWorker shuts down gracefully then nsrexecd is not hung by itself. I doubt that it's ever actually been hung, just waiting for any reason to do something. nsrexecd is not a core NetWorker Server process - it brokers access to other services, but the main server process is nsrd. If that's hung, then it's unlikely that nsrexecd will be getting any instructions, leaving nsrexecd sitting in the futex_wait_queue_me state until it gets something to do.
bingo.1
2.4K Posts
0
August 4th, 2016 13:00
I don'T know whether this helps - it's at least worth to try:
- stop NW daemons
- delete tjhe /nsr/res/nsrladb directory
- restart NW