Start a Conversation

Unsolved

This post is more than 5 years old

3595

August 30th, 2013 02:00

nsrjobd.exe on networker server crashes when savepnpc oracle backups are running

We have a Windows server 2008 R2 SP1 server with NetWorker version 7.6.5.3.Build.1195 running on it.

2/3 times a month the nsrjobd.exe crashes when running oracle on AIX savepnpc backups.

I have already made several networker updates in the last months and the windows server is patched to the latest level. Nothing works.

Here is the report.wer from last night:

Version=1

EventType=BEX64

EventTime=130222957890058334

ReportType=2

Consent=1

ReportIdentifier=58274b8f-110a-11e3-adaa-f4ce46b94044

IntegratorReportIdentifier=58274b8e-110a-11e3-adaa-f4ce46b94044

Response.type=4

Sig[0].Name=Application Name

Sig[0].Value=nsrjobd.exe

Sig[1].Name=Application Version

Sig[1].Value=7.6.5.3

Sig[2].Name=Application Timestamp

Sig[2].Value=5155db4a

Sig[3].Name=Fault Module Name

Sig[3].Value=StackHash_1dc2

Sig[4].Name=Fault Module Version

Sig[4].Value=0.0.0.0

Sig[5].Name=Fault Module Timestamp

Sig[5].Value=00000000

Sig[6].Name=Exception Offset

Sig[6].Value=0000000000000000

Sig[7].Name=Exception Code

Sig[7].Value=c0000005

Sig[8].Name=Exception Data

Sig[8].Value=0000000000000008

--------------------------

And the daemon.log:

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 076641

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 039078

8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd76641

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 076641

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 076641

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 076641

8/30/2013 2:23:09 AM  savegrp Session channel closed by nsrjobd, exit code: (unknown) 039078

8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd39078

8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd39078

8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd39078

8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd42506

8/30/2013 2:23:09 AM  nsrd networker daemons warning: nsrjobd exited with status code 255

39078 8/30/2013 2:23:09 AM  savegrp RPC error: Connection lost to nsrjobd0

8/30/2013 2:23:09 AM  nsrjobd NetWorker

0 8/30/2013 2:23:09 AM  nsrjobd 7.6.5.3.Build.1195

0 8/30/2013 2:23:09 AM  nsrjobd 1195

0 8/30/2013 2:23:09 AM  nsrjobd Fri Mar 29 10:17:07 2013

0 8/30/2013 2:23:09 AM  nsrjobd Build arch.:  ntx64

0 8/30/2013 2:23:09 AM  nsrjobd DBG=0,OPT=

39074 8/30/2013 2:23:09 AM  nsrjobd JOBS notice: Opening RAP database

42506 8/30/2013 2:23:11 AM  nsrd networker daemons info: Successfully restarted nsrjobd with PID 6832

39074 8/30/2013 2:23:11 AM  nsrd JOBS notice: All scheduled jobs which were in progress were killed

---------------

Iám running out of ideas can someone help ?

Herman Regterschot.

9 Posts

August 30th, 2013 02:00


Jobsdb size = 60 MB (was 40 MB till this morning) i have changed it this morning.

Retention = 3 days

Job inactivity timeout: 0

I have deleted the jobsdb and the tmp directories many times but unfortunately it doesnt help.

It happens always when savepnpc oracle/AIX backups are running.

We have never had this problem when windows/NMM (Exchanhe) backup jobs are running.

1.7K Posts

August 30th, 2013 02:00

Hi Herman,

Are you finding the nsrjobd crash always at around the same time? If so, what time and what processes/operations are running around that time? Maybe nsrim is running?

Could you please attaché a rendered copy of the daemon to see what is going on when the crash occurs?

Thank you,

Carlos

9 Posts

August 30th, 2013 02:00

Hi Carlos,

Here the timestamp of the last crashes:

30-8     2:23

7-8       2:18

3-8       2:50

1-8       2:16

17-7     3:38

3-7       3:39

24-6     23:22

3-6       23:22

17-5     2:18

15-5     23:22

and the daemon.log from the last few days.

Thanks.

Herman.

1 Attachment

1.7K Posts

August 30th, 2013 02:00

Hello Herman,

What is the jobsdb size and retention values you have set?

Have you tried deleting the jobsdb? To do it just stop NW services on NW server, and delete the folder /nsr/res/jobsdb, and also delete the /nsr/tmp folder while services are stopped.

Thank you,

Carlos

1.7K Posts

September 9th, 2013 02:00

Hello,

Can you please check with the Oracle DBA the maintenance tasks? for example, how often is he/she running the RMAN cross-checks?

RMAN cross-checks are very resources consuming on the NW server side so, if there are many cross-checks running and many backup/restore jobs running this could be a potential cause for the issue, otherwise I think would be better to raise a case with support for any possible known issue in regards to the crash with nsrjobd crash that could be already addressed either in the shape of a binary o new build.

Thank you,

Carlos

No Events found!

Top