Start a Conversation

Unsolved

W

13 Posts

2655

April 27th, 2021 23:00

NetWorker backup failed,other client can success,why?the log like this

suppressed 7447 bytes of output. 2021/4/27 21:14:36 Starting the '/data/public' job on host 'tjhdzx-db07'. 2021/4/27 21:14:36 tjhdzx-db07:/data/public started 0 1619529276 1 5 0 3228 7252 0 win-gk5eesvvbji savegrp NSR notice 2 %s 1 0 491 save -LL -s win-gk5eesvvbji -g jhdzx-db07/Workflow1/backup/jhdzx-db07 -a "*policy action jobid=192474" -a "*policy name=jhdzx-db07" -a "*policy workflow name=Workflow1" -a "*policy action name=backup" -y "Thu May 27 23:59:59 GMT+0800 2021" -w "Thu May 27 23:59:59 GMT+0800 2021" -m tjhdzx-db07 -b datadomainlocaldomain -t 1619528410 -o RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1619356649:1619528410;REQUESTED_LEV L:level=incr; -l incr -q -W 78 -N /data/public /data/public 2021/4/27 21:14:41 Group jhdzx-db07 waiting for 1 jobs (0 awaiting restart) to complete. 2021/4/27 21:26:04 The save job for the save set '/data/public' on the host 'tjhdzx-db07' has been completed. 2021/4/27 21:26:04 Job 192500 host: tjhdzx-db07 savepoint: /data/public had WARNING indication(s) at completion 2021/4/27 21:26:04 tjhdzx-db07:/data/public failed. 2021/4/27 21:26:04 jhdzx-db07:tjhdzx-db07:/data/public See the file 'C:\Program Files\EMC NetWorker\nsr\logs\policy\jhdzx-db07\Workflow1\backup_192474_logs\192500.log' for command output.

the nsr process and port must be regular

4 Operator

 • 

1.3K Posts

April 28th, 2021 01:00

Wow!! This is a first. Please provide some information on the backup and around the failure. You cant simply throw in a piece of un-formatted log and expect us to resolve your issue.

hint: You might be able to get some information for the failure here C:\Program Files\EMC NetWorker\nsr\logs\policy\jhdzx-db07\Workflow1\backup_192474_logs\192500.log

13 Posts

April 28th, 2021 02:00

thank you very much

192500.log:

181407:save: Step (1 of 7) for PID-41412: Save has been started on the client 'tjhdzx-db07'.
174412:save: Step (2 of 7) for PID-41412: Running the backup on the client 'tjhdzx-db07' for the save set '/data/public'.
180569:save: Identified a save for the backup with PID-41412 on the client 'tjhdzx-db07'. Updating the total number of steps from 7 to 6.
174920:save: Step (3 of 6) for PID-41412: Contacting the NetWorker server through the nsrd process to obtain a handle to the target media device through the nsrmmd process for the save set '/data/public'.
174908:save:Saving the backup data in the pool 'datadomainlocaldomain'.
175019:save:Received the media management binding information on the host 'win-gk5eesvvbji'.
174910:save:Connected to the nsrmmd process on the host 'win-gk5eesvvbji'.
175295:save: Successfully connected to the Data Domain device.
174922:save: Step (4 of 6) for PID-41412: Successfully connected to the target media device through the nsrmmd process on the host 'win-gk5eesvvbji' for the save set '/data/public'.
174422:save: Step (5 of 6) for PID-41412: Reading the save sets and writing to the target device.
174416:save: Step (6 of 6) for PID-41412: Backup has succeeded. Save is exiting. See the savegrp log to track the closure steps of the backup.
tjhdzx-db07: /data/public level=incr, 1483 MB 00:11:23 12 files
completed savetime=1619529281
94694:save: The backup of save set '/data/public' succeeded.

2.4K Posts

April 28th, 2021 03:00

174416:save: Step (6 of 6) for PID-41412: Backup has succeeded. Save is exiting. See the savegrp log to track the closure steps of the backup.
tjhdzx-db07: /data/public level=incr, 1483 MB 00:11:23 12 files
completed savetime=1619529281
94694:save: The backup of save set '/data/public' succeeded.

So - where is the problem?.

 

13 Posts

April 29th, 2021 00:00

the command log must be succeed,but it's failed in the raw log

174897 1619615692 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 71 The save job for the save set '%s' on the host '%s' has been completed. 2 20 12 /data/public 12 11 tjhdzx-db07
166684 1619615692 2 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR warning 64 Job %u host: %s savepoint: %s had %s indication(s) at completion 4 5 6 192558 12 11 tjhdzx-db07 21 12 /data/public 0 7 WARNING
90491 1619615692 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 9 %s:%s %s. 3 12 11 tjhdzx-db07 51 12 /data/public 0 6 failed
90492 1619615692 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 33 %s:%s will retry %d more time(s). 3 12 11 tjhdzx-db07 51 12 /data/public 1 1 1
121508 1619615692 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 31 %s:%s next retry in %d seconds. 3 12 11 tjhdzx-db07 51 12 /data/public 1 1 1
174905 1619615692 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 43 %s:%s See the file '%s' for command output. 3 26 10 jhdzx-db07 12 24 tjhdzx-db07:/data/public 23 97 C:\Program Files\EMC NetWorker\nsr\logs\policy\jhdzx-db07\Workflow1\backup_192542_logs\192558.log
128137 1619615693 0 0 2 4644 6852 0 win-gk5eesvvbji savegrp NSR info 63 Group %s waiting for %d jobs (%d awaiting restart) to complete. 3 26 10 jhdzx-db07 1 1 1 1 1 1
174903 1619615693 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 73 Constructing the save command for the save set '%s' on the host '%s': %s. 3 20 12 /data/public 12 11 tjhdzx-db07 20 491 save -LL -s win-gk5eesvvbji -g jhdzx-db07/Workflow1/backup/jhdzx-db07 -a "*policy action jobid=192542" -a "*policy name=jhdzx-db07" -a "*policy workflow name=Workflow1" -a "*policy action name=backup" -y "Fri May 28 23:59:59 GMT+0800 2021" -w "Fri May 28 23:59:59 GMT+0800 2021" -m tjhdzx-db07 -b datadomainlocaldomain -t 1619614809 -o RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1619356649:1619614809;REQUESTED_LEVEL:level=incr; -l incr -q -W 78 -N /data/public /data/public
174896 1619615693 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 35 Starting the '%s' job on host '%s'. 2 20 12 /data/public 12 11 tjhdzx-db07
83643 1619615693 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 13 %-58s started 1 0 24 tjhdzx-db07:/data/public
0 1619615693 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 2 %s 1 0 491 save -LL -s win-gk5eesvvbji -g jhdzx-db07/Workflow1/backup/jhdzx-db07 -a "*policy action jobid=192542" -a "*policy name=jhdzx-db07" -a "*policy workflow name=Workflow1" -a "*policy action name=backup" -y "Fri May 28 23:59:59 GMT+0800 2021" -w "Fri May 28 23:59:59 GMT+0800 2021" -m tjhdzx-db07 -b datadomainlocaldomain -t 1619614809 -o RENAMED_DIRECTORIES:index_lookup=on;BACKUPTIME:lookup_range=1619356649:1619614809;REQUESTED_LEVEL:level=incr; -l incr -q -W 78 -N /data/public /data/public
128137 1619615698 0 0 2 4644 6852 0 win-gk5eesvvbji savegrp NSR info 63 Group %s waiting for %d jobs (%d awaiting restart) to complete. 3 26 10 jhdzx-db07 1 1 1 1 1 0
174897 1619616393 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 71 The save job for the save set '%s' on the host '%s' has been completed. 2 20 12 /data/public 12 11 tjhdzx-db07
166684 1619616393 2 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR warning 64 Job %u host: %s savepoint: %s had %s indication(s) at completion 4 5 6 192564 12 11 tjhdzx-db07 21 12 /data/public 0 7 WARNING
90491 1619616393 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 9 %s:%s %s. 3 12 11 tjhdzx-db07 51 12 /data/public 0 6 failed
174905 1619616393 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 43 %s:%s See the file '%s' for command output. 3 26 10 jhdzx-db07 12 24 tjhdzx-db07:/data/public 23 97 C:\Program Files\EMC NetWorker\nsr\logs\policy\jhdzx-db07\Workflow1\backup_192542_logs\192564.log
148758 1619616398 1 5 0 4644 6852 0 win-gk5eesvvbji savegrp NSR notice 71 Action %s '%s' with job id %u is exiting with status '%s', exit code %d 5 0 18 backup traditional 0 6 backup 5 6 192542 0 6 failed 1 1 1

2.4K Posts

April 29th, 2021 11:00

I give up.

Once again, you are hiding valuable information because you obviously do not know NW fundamental issues. This log is useless because you have not used the proper log file 'converter' nsr_render_log which would replace all you tiny variables with true values. Then you should see a more meaningful text which might contain the problem (and indirectly the solution already).

May I suggest that you participate in a basic NW course which will explain NW fundamentals before you look at the problem again.

 

13 Posts

April 29th, 2021 22:00

I'm so sorry. I really a newcomer to networker. I'll learn some NW foundation.

May 3rd, 2021 12:00

as you indeed haven't rendered the output in human readable format, it is difficult to assess the reason.

however I see mentioned "WARNING indication(s) at completion". You might have set the job to be reported as failed in case a warning occurs? That might for example be when you run into a "file changed during save" situation. So look if that is the case here, which the logs should state (possibly logfile C:\Program Files\EMC NetWorker\nsr\logs\policy\jhdzx-db07\Workflow1\backup_192542_logs\192564.log but that might have already been deleted by now due to default retention for log files being 3 days (72 hours)).

If you don't mind that a file might change during save, then you could alter the job success state, by allowing it to be reported as successful in case of warnings. open files like DB files, should be skipped through a skip directive, either from nw end or on client end (which we do as application teams often create new application filesystem directories automatically while creating the required client end directives right away instead of having to involve us for that. So the filesystem backups would not backup Oracle nor SAP on Oracle directories containing the DB files)

13 Posts

May 6th, 2021 23:00


181407:save: Step (1 of 7) for PID-20095: Save has been started on the client 'tjhdzx-db07'.
174412:save: Step (2 of 7) for PID-20095: Running the backup on the client 'tjhdzx-db07' for the save set '/data/public'.
180569:save: Identified a save for the backup with PID-20095 on the client 'tjhdzx-db07'. Updating the total number of steps from 7 to 6.
174920:save: Step (3 of 6) for PID-20095: Contacting the NetWorker server through the nsrd process to obtain a handle to the target media device through the nsrmmd process for the save set '/data/public'.
174908:save:Saving the backup data in the pool 'datadomainlocaldomain'.
175019:save:Received the media management binding information on the host 'win-gk5eesvvbji'.
174910:save:Connected to the nsrmmd process on the host 'win-gk5eesvvbji'.
175295:save: Successfully connected to the Data Domain device.
174922:save: Step (4 of 6) for PID-20095: Successfully connected to the target media device through the nsrmmd process on the host 'win-gk5eesvvbji' for the save set '/data/public'.
174422:save: Step (5 of 6) for PID-20095: Reading the save sets and writing to the target device.
174416:save: Step (6 of 6) for PID-20095: Backup has succeeded. Save is exiting. See the savegrp log to track the closure steps of the backup.
tjhdzx-db07: /data/public level=incr, 1648 MB 00:11:33 12 files
completed savetime=1620306915
94694:save: The backup of save set '/data/public' succeeded.
--- Job Indications ---
Warning: `/data/public/mongo/22017/run/mongod.log' size grew during save
Expected 1683615881 bytes for `/data/public/mongo/22017/run/mongod.log', got 1683627131 bytes
tjhdzx-db07:/data/public: retried 1 times.



It must like you said, "file changed during save" situation. How can I alter the job success state, by allowing it to be reported as successful in case of warnings. Thanks.

2.4K Posts

May 7th, 2021 02:00

It is the nature of some files that they will change (grow) during the backup - the time at the very beginning when NW builds up the internal worklist until it will actually backup the file. I am surprised that NW will actually set the status to failed - usually it just internally sets an appropriate warning.

 

How to solve the issue:

  - One method would be to stop the application before and to restart it after the backup.

  - If this is a NW supported database, use the appropriate module that will also ensure that the you have a consistent backup. In this case you have to create 2 client resources for the same host - one to backup the filesystem without the database and the other to only backup the db with the appropriate backup command (the NW server does the same for its internal databases).

  - You can apply the Unix standard directive (see below) for that client to overcome this issue. Do not forget to add the marked statement to the resource for your specific environment.

asm_1.jpgasm_2.jpg

However, applying 'logasm' does not necessarily mean that the backup (of the database) will be consistent. So be careful.

You will find more details about the NW directives and 'ASMs' in the uasm manpage (command line reference).

 

 

May 9th, 2021 12:00

As bingo also refers to, to me it seems you are making a backup of the log file of mongodb. I assume you also have something in place to backup the db? For example in its simplest form a dump to disk (preferably triggered by a post command by nw)? Or using the nmda module feature from nw18.1 onwards called application orvhestration protection leveraging nw, nmda and boostfs. There is KB article 000158074 detailing how to do this for postgresql.

For the filesystem backup changed during save failure, look into the backup action options, if memory serves me well. There is an option to set for based on warning or success.

No Events found!

Top