uvarovnick

27 Posts

2011

February 21st, 2020 04:00

Networker stopped backing up to one of the pools

Networker (on DataDomain) has several pools:
# nsrmm -C
Data Domain disk Dataeth. 001 mounted on dd6800-vc1.vaz.ru_Data-eth, write enabled
Data Domain disk FA.002 mounted on dd6800-vc1.vaz.ru_FA, write enabled
Data Domain disk Oraclefc.001 mounted on dd6800-vc1.vaz.ru_Oracle-fc, write enabled
Data Domain disk Oracleeth. 001 mounted on dd6800-vc1.vaz.ru_Oracle-eth, write enabled
Data Domain disk Index. 001 mounted on dd6800-vc1.vaz.ru_Index, write enabled
Data Domain disk Datafc.001 mounted on dd6800-vc1.vaz.ru_Data-fc, write enabled

Backups stopped going to the Dataeth.001 pool. There are several politicians and clients. Not a single backup is completed.
On other pools backups pass.
What should i do?

Responses(5)

C

crazyrov

4 Operator

•

1.3K Posts

0

February 25th, 2020 01:00

@uvarovnick, Looks like for some reason NetWorker server is not able to allocate the required device. Can you try doing a soft reset on the device and then trying the backups again. The command is as follows

nsrmm -Hev -f dd6800-vc1.vaz.ru_Data-eth

B1

bingo.1

2.4K Posts

0

February 21st, 2020 09:00

Dataeth.001 is not a pool - it is a volume which belongs to a certain pool. Whatever this name would be (Dataeth ?).

If the device/volume is the only one belonging to this pool, it is most likely that backups cannot continue if the connection has been interrupted (for whatever purpose). So it is best practice to have more than 1 device/volume per pool to provide NW an alternative.

The volume itself seems to be fine - at least it is still 'write-enabled'. I hope it is still 'appendable'. In this case you should have received a proper alert. Otherwise I do not have a direct clue as there could be to many different problems. Here is what I would to:

- Try to unmount & re-mount the volume (do not label it !!!)

- Try a short manual backup to this pool (save -b whatever_directory)

- If this fails, have a closer look at the error messages and the alert. At least daemon.raw should tell you more.

- Add a second device/volume to the same pool.

Good luck.

uvarovnick

27 Posts

0

February 24th, 2020 22:00

Yes, Dataeth.001 is not a pool - it is a volume.
There are several disks on the pool:
Data Domain disk Oraclefc.001 mounted on dd6800-vc1.vaz.ru_Oracle-fc, write enabled
Data Domain disk FA.002 mounted on dd6800-vc1.vaz.ru_FA, write enabled
Data Domain disk Dataeth. 001 mounted on dd6800-vc1.vaz.ru_Data-eth, write enabled
Data Domain disk Datafc.001 mounted on dd6800-vc1.vaz.ru_Data-fc, write enabled
Data Domain disk Index. 001 mounted on dd6800-vc1.vaz.ru_Index, write enabled
But one Dataeth.001 drive does not work.
I tried to unmount-mount Dataeth.001. This works fine, with no errors.
Short manual backups fail. Here are the messages in the console:
[root @ nsr logs] # save -b Data-eth /nsr/logs/daemon.raw
181407: save: Step (1 of 7) for PID-25435: Save has been started on the client 'nsr.vaz.ru'.
175313: save: Step (2 of 7) for PID-25435: Running the backup on the client 'nsr.vaz.ru' for the selected save sets.
3817: save: Using nsr.vaz.ru as server
57777: save: Multiple client instances of nsr.vaz.ru, using the first entry

180569: save: Identified a save for the backup with PID-25435 on the client 'nsr.vaz.ru'. Updating the total number of steps from 7 to 6.
174920: save: Step (3 of 6) for PID-25435: Contacting the NetWorker server through the nsrd process to obtain a handle to the target media device through the nsrmmd process for the save set '/nsr/logs/daemon.raw' .
174908: save: Saving the backup data in the pool 'Data-eth'.

175297: save: Unable to set up the direct save with server 'nsr.vaz.ru': busy.
90016: save: NetWorker server 'nsr.vaz.ru': busy
5766: save: waiting 30 seconds then retrying

On another disk, a short manual backup is successful:
[root @ nsr logs] # save -b FA /nsr/logs/daemon.raw
181407: save: Step (1 of 7) for PID-24840: Save has been started on the client 'nsr.vaz.ru'.
175313: save: Step (2 of 7) for PID-24840: Running the backup on the client 'nsr.vaz.ru' for the selected save sets.
3817: save: Using nsr.vaz.ru as server
57777: save: Multiple client instances of nsr.vaz.ru, using the first entry

180569: save: Identified a save for the backup with PID-24840 on the client 'nsr.vaz.ru'. Updating the total number of steps from 7 to 6.
174920: save: Step (3 of 6) for PID-24840: Contacting the NetWorker server through the nsrd process to obtain a handle to the target media device through the nsrmmd process for the save set '/nsr/logs/daemon.raw' .
174908: save: Saving the backup data in the pool 'FA'.

175019: save: Received the media management binding information on the host 'nsr.vaz.ru'.

174910: save: Connected to the nsrmmd process on the host 'nsr.vaz.ru'.

175295: save: Successfully connected to the Data Domain device.
129292: save: Successfully established Client direct save session for save-set ID '827635027' (nsr.vaz.ru:/nsr/logs/daemon.raw) with Data Domain volume 'FA.002'.

174922: save: Step (4 of 6) for PID-24840: Successfully connected to the target media device through the nsrmmd process on the host 'nsr.vaz.ru' for the save set '/nsr/logs/daemon.raw' .
174422: save: Step (5 of 6) for PID-24840: Reading the save sets and writing to the target device.
66135: save: NSR directive file (/nsr/logs/.nsr) parsed
/nsr/logs/daemon.raw

Unfortunately in daemon.raw I see no errors.

uvarovnick

27 Posts

0

February 25th, 2020 02:00

crazyrov, thanks! I am very grateful for this help! Now my pool is working!

B1

bingo.1

2.4K Posts

0

February 25th, 2020 03:00

BTW - another method would be to kill the appropriate nsrmmd process on the storage node. Unfortunately, you cannot easily figure out which process supports which drive. Therefore stopping the appropriate nsrsnmd would also help but will of course affect all devices on that storage node. So you better do that when other devices are idle.

View All

No Events found!