4 Operator

 • 

14.4K Posts

May 4th, 2006 01:00

First thing to do is to put that client into separate group. Have that group start and go to same pool as previous one. With that you won't affect other clients as in newly created group this will be the only client.

Then, when it happens, copy content of ..\nsr\tmp directory from backup server and put it to safe location. Check if you get rpc responses from both server2client and client2server sides. Then try to stop/kill the backup which is hanging. Then try client initiated backup in debug mode and capture the ouput. If lucky it will hang again. Then send all those outputs to support.

8 Posts

May 4th, 2006 09:00

Unfortunately I hit Post Message twice last night and have two threads going with this subject. As I posted on the other, the below thread seems similar to my own issue:

http://forums.emc.com/forums/thread.jspa?threadID=32123&tstart=160

No resolution was listed for this post, though.

8 Posts

May 4th, 2006 09:00

First thing to do is to put that client into separate
group. Have that group start and go to same pool as
previous one. With that you won't affect other
clients as in newly created group this will be the
only client.

Then, when it happens, copy content of ..\nsr\tmp
directory from backup server and put it to safe
location. Check if you get rpc responses from both
server2client and client2server sides. Then try to
stop/kill the backup which is hanging. Then try
client initiated backup in debug mode and capture the
ouput. If lucky it will hang again. Then send all
those outputs to support.



Hrvoje,

Thanks for the reply. Here's where I'm at. I'm testing the client (which is still hung) in its own savegroup. Here are the results of the rpcinfo command on both client and server:

Server:

root@tanya:/nsr/tmp.save# rpcinfo
program version netid address service owner
100000 4 ticots tanya.rpc rpcbind superuser
100000 3 ticots tanya.rpc rpcbind superuser
100000 4 ticotsord tanya.rpc rpcbind superuser
100000 3 ticotsord tanya.rpc rpcbind superuser
100000 4 ticlts tanya.rpc rpcbind superuser
100000 3 ticlts tanya.rpc rpcbind superuser
100000 4 tcp 0.0.0.0.0.111 rpcbind superuser
100000 3 tcp 0.0.0.0.0.111 rpcbind superuser
100000 2 tcp 0.0.0.0.0.111 rpcbind superuser
100000 4 udp 0.0.0.0.0.111 rpcbind superuser
100000 3 udp 0.0.0.0.0.111 rpcbind superuser
100000 2 udp 0.0.0.0.0.111 rpcbind superuser
100024 1 tcp 0.0.0.0.192.0 status superuser
100024 1 udp 0.0.0.0.192.7 status superuser
100021 1 tcp 0.0.0.0.192.1 nlockmgr superuser
100021 1 udp 0.0.0.0.192.8 nlockmgr superuser
100021 3 tcp 0.0.0.0.192.2 nlockmgr superuser
100021 3 udp 0.0.0.0.192.9 nlockmgr superuser
100021 4 tcp 0.0.0.0.192.3 nlockmgr superuser
100021 4 udp 0.0.0.0.192.10 nlockmgr superuser
100020 1 udp 0.0.0.0.15.205 llockmgr superuser
100020 1 tcp 0.0.0.0.15.205 llockmgr superuser
100021 2 tcp 0.0.0.0.192.4 nlockmgr superuser
100068 2 udp 0.0.0.0.192.55 cmsd superuser
100068 3 udp 0.0.0.0.192.55 cmsd superuser
100068 4 udp 0.0.0.0.192.55 cmsd superuser
100068 5 udp 0.0.0.0.192.55 cmsd superuser
805306352 1 tcp 0.0.0.0.2.224 - superuser
100005 1 udp 0.0.0.0.192.99 mountd superuser
100005 3 udp 0.0.0.0.192.99 mountd superuser
100005 1 tcp 0.0.0.0.192.212 mountd superuser
100005 3 tcp 0.0.0.0.192.212 mountd superuser
100003 2 udp 0.0.0.0.8.1 nfs superuser
100003 3 udp 0.0.0.0.8.1 nfs superuser
100003 2 tcp 0.0.0.0.8.1 nfs superuser
100003 3 tcp 0.0.0.0.8.1 nfs superuser
100083 1 tcp 0.0.0.0.0.0 ttdbserver superuser
390113 1 tcp 0.0.0.0.31.1 nsrexec unknown
390103 2 tcp 0.0.0.0.31.19 nsrd unknown
390109 2 tcp 0.0.0.0.31.19 nsrstat unknown
390110 1 tcp 0.0.0.0.31.19 nsrjb unknown
390120 1 tcp 0.0.0.0.31.19 - unknown
390103 2 udp 0.0.0.0.31.60 nsrd unknown
390109 2 udp 0.0.0.0.31.60 nsrstat unknown
390110 1 udp 0.0.0.0.31.60 nsrjb unknown
390120 1 udp 0.0.0.0.31.60 - unknown
390105 5 tcp 0.0.0.0.31.23 nsrindexd unknown
390105 6 tcp 0.0.0.0.31.23 nsrindexd unknown
390107 5 tcp 0.0.0.0.31.53 nsrmmdbd unknown
390107 6 tcp 0.0.0.0.31.53 nsrmmdbd unknown
390104 105 tcp 0.0.0.0.31.46 nsrmmd unknown
390104 205 tcp 0.0.0.0.31.62 nsrmmd unknown
390104 405 tcp 0.0.0.0.31.52 nsrmmd unknown
390104 605 tcp 0.0.0.0.31.31 nsrmmd unknown
1342177279 4 tcp 0.0.0.0.195.176 - superuser
1342177279 1 tcp 0.0.0.0.195.176 - superuser
1342177279 3 tcp 0.0.0.0.195.176 - superuser
1342177279 2 tcp 0.0.0.0.195.176 - superuser
1342177280 4 tcp 0.0.0.0.237.68 - superuser
1342177280 1 tcp 0.0.0.0.237.68 - superuser
1342177280 3 tcp 0.0.0.0.237.68 - superuser
1342177280 2 tcp 0.0.0.0.237.68 - superuser

Client:

Z:\>rpcinfo -p localhost
program vers proto port
100000 2 tcp 7938
100000 2 udp 7938
390113 1 tcp 7937
program vers proto port
100000 2 tcp 7938
100000 2 udp 7938
390113 1 tcp 7937

I am currently attempting to use the dbgcommand utility (provided to us by Legato support) to debug the currently running savegroup.

root 17875 17802 0 11:55:26 ? 0:00 /opt/networker/bin/nsrexec -c erecdevweb-c2 -a -- erecdevweb-c

root 17873 17802 0 11:55:26 ? 0:00 /opt/networker/bin/nsrexec -c erecdevweb-c2 -a -- erecdevweb-c

root 17996 1559 1 11:55:43 pts/0 0:00 grep dev

root 17874 17802 0 11:55:26 ? 0:00 /opt/networker/bin/nsrexec -c erecdevweb-c2 -a -- erecdevweb-c

root 17872 17802 0 11:55:26 ? 0:00 /opt/networker/bin/nsrexec -c erecdevweb-c2 -a -- erecdevweb-c

root@tanya:/tmp# ./dbgcommand.dat -p 17872 Debug=9
root@tanya:/tmp# ./dbgcommand.dat -p 17873 Debug=9
root@tanya:/tmp# ./dbgcommand.dat -p 17874 Debug=9
root@tanya:/tmp# ./dbgcommand.dat -p 17875 Debug=9



We've not had any success in receiving any additional info in the daemon.logs using this program. If you have any other means of "debugging" the savegroup on client or server, please let me know. Thanks again!

4 Operator

 • 

14.4K Posts

May 4th, 2006 13:00

Hi Dave,

You should test RPC against each other. So, from client you should do "rpcinfo -p server" and from server you should do "rpcinfo -p client". This will probably work. Them you test nsrexecd on both sides:
- from client do: echo print | nsradmin -p 390113 -i - -s server_name
- from server do: echo print | nsradmin -p 390113 -i - -s client_name

I expect this will work too, but we better check it out.

I would suggest to put nsrd to debug mode - this will put all NW processes into the debug as well. I don't thing problem is with savegrp binary, but rather something else.

I never saw before ticots (loopback transport providers) to portmapper - no idea if that could mess up or not - I guess not (I really have no experience with that). My guess is this is not relevant (otherwise other clients would have same problem). Given that only one client is giving you a trouble here is set of perhaps silly questions, but you never know...
- is that client specific in something (eg. some specific application)
- do you have latest patch level for it
- was the client rebooted since trouble started - do you longer period of things going well after reboot or there is no connection?

For savegrp itself, use truss/(s)trace just to see what it gives you before any debug thing. Do you see it looping in something or sleeping doing nothing? When you go to client, while things are bad, do you activity of the save.exe? Does it use CPU at higher rate than usual? Is there anything in event logs? While save.exe started from server is hanging are you able to start save from the client?

When you stop the savegrp of the server, do you see save.exe going away on the client? If not, I would suspect possible issue with file system on that client (seen that few times).
No Events found!

Top