Start a Conversation

Unsolved

This post is more than 5 years old

3864

May 10th, 2010 05:00

Networker Server can't decode arguments error?

Hi,

Networker Server (NetWorker Management Console version 3.5.2.Build.477 based on NetWorker version 7.5.2.Build.477) is fibre attached through a SAN switch to an Clariion CX3-40f and a Quantum Scalar i2000 tape library. All backups are directed through the Networker Server (storage nodes have been disabled) directly to the EDL, and then cloned off to tape.

We have been experiencing a problem with scheduled and manual cloning. All backups to the EDL completed successfully, however daily and weekly clones of all savesets created are not all getting completed. Further investigation shows that the clones of the daily/ weekly backup server (i.e. just the Networker server itself) all complete successfully, and most of the time we get 2 out 4 Exchange storage group backups getting cloned off to tape but all the other client backups (50+) failed to clone to tape.

There have been no changes on the server that we are aware of aside from an upgrade of Networker (as suggested by EMC support) which has not helped in resolving this problem (we recently upgraded from 7.4.4 to 7.5.2.2). Before this problem happened we had to delete and rebuild one of the jukeboxes due to a drive ordering issue however this was rebuilt successfully with no issues. Manual cloning via GUI (NMC> Media> Savesets created in last night's backups shows "browsable" or "recoverable") seems to fail, however when we run the clone I can see the EDL volume getting mounted into a drive along with a physical tape in the Scalar i2k. Both tapes will sit there for about 2-3 minutes (this is correct according to the jukebox timers) then unload and the operation times out/ fails. A clone by command line of a saveset shows this in more detail:

E:\Program Files\Legato\nsr\bin>nsrclone -vvv -b "Weekly Clone" -S 4276408269
6215:nsrclone: Cloning the following save sets (ssid/cloneid):
        b9ba1066-00000006-fee4cfcd-4be4cfcd-12030000-0ac90808/1273286605
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
6217:nsrclone:   ...from storage node: ukhubedl01.plan-int.org
39078:nsrclone: RPC error: Server can't decode arguments

5777:nsrclone: Cannot open nsrclone session with plukhubmgt07.plan-int.org
6218:nsrclone: Cannot open nsrclone session with plukhubmgt07.plan-int.org. Erro
r is 'Server can't decode arguments'

5882:nsrclone: Failed to clone any save sets

E:\Program Files\Legato\nsr\bin>

However when the tapes are pre-mounted first and the above command is run again (or if I run the same command after the operation has failed the first time) it works without any problem:

E:\Program Files\Legato\nsr\bin>nsrclone -vvv -b "Weekly Clone" -S 4276408269

The tape status under Media> Savesets changes to "browsable has clones". I have tested this for a number of different savesets for different clients, all with the same results.
6215:nsrclone: Cloning the following save sets (ssid/cloneid):
        b9ba1066-00000006-fee4cfcd-4be4cfcd-12030000-0ac90808/1273286605
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
5884:nsrclone: Successfully cloned all requested save sets
5886:nsrclone: Clones were written to the following volume(s):
        000152

Analysis:

- testing seems to indicate that:

     - if the tapes required for reading or writing is not ready, n/w give the error:

          39078:nsrclone: RPC error: Server can't decode arguments

     - if the tapes are preloaded and ready, then the clone completes without problems


- setting the idle device timeout=0 , and increasing the load sleep=2 minutes has no impact on problem (we have tried with various other load/ unload/ idle timeout sleep parameters to no effect).

We had this problem on Networker server version 7.4.4. We have tried various things like ensuring the consistency of the Networker catalog:

nsrim -X

nsrck -m

nsrck -L6

We have also stopped the services, renamed the "tmp" folder and restarted them, performed more integrity checks and we still have the same issue. We have checked all the flags are ok for the savesets using "mminfo -avq -S". EMC suggested a corruption of the binaries, which have now all been replaced with the upgrade to 7.5 SP2.

Can you suggest a cause for this problem and a resolution? I'm thinking it's a timeout issue but I would have thought the binaries would be the cause??

I have tried increasing the debug trace level for the jukebox to 5 and then running the operation again, however no more information on this error is recorded in the rendered daemon.raw.

Thanks.

53 Posts

May 11th, 2010 12:00

Has anyone got any ideas on what could be causing this problem?

2 Intern

 • 

14.3K Posts

June 3rd, 2010 01:00

I have seen things like this when readhostname was not set correctly (when doing it from CLI).  I assume you mount tapes on ukhubedl01.plan-int.org when it works?  Is this correct?

2 Posts

August 27th, 2010 13:00

Just wondering if you received a fix for this problem?   We're experiencing the exact same problem running Networker 7.5.1.   Cloning process receives the RPC error with the "can't decode arguments" verbage.   I realize this was from some time ago but thought you might remember how this was resolved.

Thanks

53 Posts

August 27th, 2010 13:00

Hi there,

We never discovered the true cause to the problem however we upgraded the server to 7.5.3 and then tried stopping the Networker services on the Networker server, recreating the \legato\nsr\tmp folder and then restarting the services and all manual/ automatic cloning seemed to work fine again and we never saw the "can't decode arguments" message again.

We had an EMC SME engineer examine the old \tmp folder however they were unable to find anything wrong.

I'm still not totally convinced this was the cause, but it seemed to work for us.

Hope this helps!

2 Posts

August 30th, 2010 08:00

Hey,
Thanks for the quick response.  So it doesn't sound like you found the root cause of this particular problem.   I'll keep your resolution in mind and open a case with EMC as I found a couple of hits in Powerlink but your situation sounded so similar.    Thanks again!
No Events found!

Top