Unsolved
This post is more than 5 years old
1 Rookie
•
66 Posts
0
4667
October 1st, 2015 09:00
Networker suddenly unable to clone
This one is odd, bear with me.
Environment:
Networker 8.2.0.4 server (srv00) AIX 7.1
2 networker storage nodes (srv01, srv02) AIX 7.1
Data Domain DD860 5.5.0.9
Sun Storwize L700 Tape Library
Most of our backups go to any of 3 DD Boost Devices, 1, 2 and 3. They all run fine, no issue. All 3 are available on all 3 storage nodes.
When I start my clone cycle, they run fine, they can read the data, and write the clone to the tape library.
At some point, usually after several days (5-6), some of the clones fail, new clones fail. Already running clones continue to run, and new backups continue to run. However, any clones that were waiting for a tape drive, or any clones that kick off (even if they ran fine yesterday) suddenly fail. I'm still actively to reading from ddboost devices 1, 2 and 3. I'm also still writing to tape. The error I get is:
80470:nsrclone: Following volumes are needed for cloning
80471:nsrclone: ddboost1.001 (Regular)
100907:nsrd: no matching data domain devices for save of client `srv00.nysed.gov'; check storage nodes, devices or pools
77181:nsrclone: Cannot open nsrclone session with srv00.nysed.gov. Error is 'No destination device is available for the cloning operation.'
79625:nsrclone: Failed to clone any Regular save sets
77181:nsrclone: Cannot open nsrclone session with srv00.nysed.gov. Error is 'No destination device is available for the cloning operation.'
It all is fixed with a reboot of the networker server.
I tried unchecking dynamic nsrmmd's on each storage node. I tried creating an ADFT local to the server, labling it to the default pool and running a clone. I get the same error, word for word. It feels like Networker clone suddenly can't talk to it's devices, even though the devices show as ready. Or perhaps there is some sort of timeout that is exceeded and suddenly causes devices to act offline, but only for cloning. Any thoughts?
0 events found


bingo.1
2.4K Posts
0
October 1st, 2015 10:00
This is the obvious key message: "100907:nsrd: no matching data domain devices for save ...."
So it seems that you have configured NW to clone to a DD device - not to tape. Check you pool settings, especially for the devices.
Davidtgnome
1 Rookie
•
66 Posts
0
October 1st, 2015 13:00
Sorry, no. I'm starting to believe the error is false. It is not cloning to a DDBoost device.
bingo.1
2.4K Posts
0
October 1st, 2015 18:00
What if the error is right?
Your intention is to clone to tape. But NW seems to ignore it.
So far I have no idea why but why don't you just test/verify the current behavior?
Create a new DD device and label its media to the clone pool.
Then restart the process and see whether the clone will write to that media.
Of course you can abort the process when you have verified the facts.
What do you have to loose?
Davidtgnome
1 Rookie
•
66 Posts
0
October 2nd, 2015 06:00
I did so. It failed with precisely the same error. Also it's not cloning TO a ddpool but From. However I created a destination DD pool just for shits and grins and that failed too. Reboot and it works.
Perhaps I am not being clear
Clones work
Nothing changes
Precisely the same clones fail. Same source, same destination. Some continue to run, but newly executed ones fail
Reboot
The clones all work again.
I have showed this to 4 engineers from EMC since the beginning of September when I opened the case. None of them have any idea what is going on.
bingo.1
2.4K Posts
0
October 3rd, 2015 02:00
It is clear that you want to clone to tape. Nothing unusual. We do it the exact same way.
On the other hand, I could just rely on the error message you posted.
I am not sure about your clone method. We use scripted cloning and of course we only have tape media labeled for that pool.
If you run from a clone process from the command line and override the resources, will the clone then end successfully?
ble1
6 Operator
•
14.4K Posts
•
56.2K Points
0
October 3rd, 2015 05:00
Automated cloning or via script?
Davidtgnome
1 Rookie
•
66 Posts
0
October 5th, 2015 04:00
Excellent questions. When it happens it fails for clones on save-group completion, Scheduled clones in the networker GUI and clones at the command line.
While it was broken we ran a test cloning a single saveset to tape, NFS disk space, and DDBoost. All 3 failed with the same error.
Upon reboot, all 3 worked perfectly.
Davidtgnome
1 Rookie
•
66 Posts
0
October 5th, 2015 12:00
Update from Support.
BUG #235187 ESC #23334 nsrmmd not assigned additional sessions after jobs complete
They claim that it's fixed in 8.2 SP 2 Which was supposed to be released today.
So the error might really mean that a process is not available on the storage node, that can be used to contact the Data Domain device to establish the read side of the conversation. Not the right error but I suppose it's possible.
ble1
6 Operator
•
14.4K Posts
•
56.2K Points
0
October 7th, 2015 02:00
That fix is part of 8.0.4.4, 8.1.3.2 and 8.2.1.5. I would rather use 8.2.1.8 than 8.2.2.0.