Unsolved
This post is more than 5 years old
17 Posts
0
1426
April 13th, 2007 05:00
hanging nsrmmd on Linux storage node
Hi everybody,
in our environment we have a huge problem with hanging nsrmmd-processes on our Linux Storage Nodes.
On our Networker Server we run W2K3 with Networker 7.3.2 jumbo patch, our Linux Storage Nodes are Redhat 3.0 und 4.0, with Networker 7.3.2 jumbo patch.
Sometimes a bad tape causes a read/write(I/O)-Error on our Linux nodes, und we have to unload the tape via SJIMM, because unload via Networker won´t work. The nsrmmd on the storage node continues to "hang" and there is no chance to use the drive again.
We tried to reboot the drive, the library, the networker processes on the storage node and to kill the nsrmmd with "kill -9", but there is no chance to stop the process. The only thing that works ist to reboot the storage node.
Any ideas what i can do to avoid rebooting the linux node?
Thank You,
Christian
in our environment we have a huge problem with hanging nsrmmd-processes on our Linux Storage Nodes.
On our Networker Server we run W2K3 with Networker 7.3.2 jumbo patch, our Linux Storage Nodes are Redhat 3.0 und 4.0, with Networker 7.3.2 jumbo patch.
Sometimes a bad tape causes a read/write(I/O)-Error on our Linux nodes, und we have to unload the tape via SJIMM, because unload via Networker won´t work. The nsrmmd on the storage node continues to "hang" and there is no chance to use the drive again.
We tried to reboot the drive, the library, the networker processes on the storage node and to kill the nsrmmd with "kill -9", but there is no chance to stop the process. The only thing that works ist to reboot the storage node.
Any ideas what i can do to avoid rebooting the linux node?
Thank You,
Christian
0 events found
No Events found!


ble1
6 Operator
•
14.4K Posts
•
56.2K Points
0
April 13th, 2007 06:00
vsemaska
194 Posts
0
April 13th, 2007 06:00
Vic
dk3
163 Posts
0
April 13th, 2007 09:00
are you sure that Networker uses the right drives when accessing the tapes? Just today I had an issue that sounds quite similar to your one with an Windows storage node. I suppose it to be an persistent binding issue with the storage node. I did not have enough time to investigate this behaviour a little closer 'til now. I also think we enabled persistent binding on our windows storage node but behaviour is completely different...
Could you perhaps check the following: If you have two drives on your storage node run an inventory on two tapes that are already in Networker's index. One tape on each drive. If it is an persistent binding problem you'll probably get a message that says something like
'Found tape XXX1, expected tape XXX2, updating Networker database'
best regards.
restorovic
17 Posts
0
April 18th, 2007 06:00
thank you for your suggestions.
I´m not too familiar with Linux, so sorry for my not-so-exact-answers
To your questions:
- Our kernel release is 2.4.21-47.0.1.ELsmp, but from what i have seen until now i think, it could happen with another version, too.
- If it is really an I/O-Issue and a process in an uninterruptable state and neither NW nor operating system can handle the issue without rebooting our only way would be to avoid nsrmmd to get in this state. But how?
- As far as i can see, NW uses the right drives. We check this with sjimm and mt, and everything seems to be right.
At least, one question: With your linux storage nodes, do you use /dev/nst* for your remote devices? And does anybody use the "scsidev"-utility?
Yours,
Christian
ble1
6 Operator
•
14.4K Posts
•
56.2K Points
0
April 18th, 2007 07:00
uninterruptable state and neither NW nor operating
system can handle the issue without rebooting our
only way would be to avoid nsrmmd to get in this
state. But how?
Actually what happens to nsrmmd is consequence so cause should be looked somewhere else. Usually that is SAN, but with Linux it can be kernel too. Finding the cause is not easy and might involve expensive solutions such as SCSI/SAN sniffers.
nodes, do you use /dev/nst* for your remote devices?
And does anybody use the "scsidev"-utility?
So far nst and there was no need to scsidev (I believe due to persistent binding and/or udev usage).