Schedule SQL backup hang

Question

Hi, I've had a problem with my SQL backup with Networker since many months now and hope someone saw this problem elsewhere. First, here's my confirguration : Networker server version : 7.4.4 Networker Module for SQL version :  5.2 Here's what is going on : Not at every backup, i can have 1 or 2 servers/clusters to stay at the top of the backup windows in the Networker GUI. (Waiting for execution box) If this happen, the backup never fail or stop or anything, the only thing we can do is probe the grp then restart...  If we do that, well it can work good for a while then it will re-happen and it could be different server. The frequence the grp is ran is 30 minutes.  Normally it never takes more than 3-4 minutes to complete. Thanks and have a good day, Christian

PeterBen · Answer

poor description. os/sql version? architecture? any tasks on the sql running which can interrupt the backups? have you checked the applogs on the client for any errors? event viewer errors?

Mark_Bellows · Answer

Good afternoon!

I need a little more information before I can help with this issue.

What is the operating system?

What is the clients nsrexecd version?

Are you having this issue with any other servers?

How long has this client been part of the back up environment?

Have there been any recent changes to the Operating System or network switches?

Last one! What is the connection to the tape library?

Also, if you can attach the deamon.raw file from the client system to a post so I can look at it, I may be able to help you narrow down the problem.

Thank you!

Mark

CarrierC · Answer

Hello Peter and Mark,I thank you for your response and will add below more informations as asked.&#xa0; (Sorry if it was not enough descriptive at first)Ok so all our SQL servers are Windows 2003 R2 SP2 (x86 or x64)All of these are working with the nsr client version 7.4.4 (nsrexecd ???)Nope this issue is only seen with SQL servers, we have a lot of other system in the infrastructure and none give us these kind of problem.The migration had started at the beginning of January 2010, so SQL configuration clients has been done in the following weeks or month from that date.&#xa0; So we can say that they've been configured since a copple # of months.From my point of view there has not been major change to the servers of the networking envirroment&#xa0; that could be causing this problem.&#xa0; That problem was there at the beginning and was just not seen as often as now, since there was much less servers configured.On the network side, we had them check if there was any ports problem, passing trough firewall and all, nothing apparant was detected.&#xa0; We obtain the problem event with server in the internal zone (no firewall)One more point, as stated earlier, we are in the process of migrating from a Tivoli envirroment, so i remarked the SQL servers have at the same time the SQL module for Tivoli and Networker.&#xa0; From the problem saw with Oracle, we though that maybe the same issue would affect the SQL (still need to be validated)Now i'll explain a bit how we are working for the SQL backups (Architecture)SN : Storage NodeDSN : Dedicated Storage NodeEDL : EMC disk library*** All backup are schedule with third party software ***The way it's done is as follow :With the third party software, we have planned a SQL backup every 30minutes.&#xa0; here and there these days, the group complete at 90-95% and never fail, holding the next backup, one client stay stuck at the top, never to start and never finish the group.&#xa0; All the clients in those grp send their meta data to a SN by the LAN or are using there own virtual drives(DSN) to backup their data.&#xa0; The SN or the client DSN are linked to an EDL4206 and store their meta data there.&#xa0; The SN or DSN are not passing trough the LAN but only by the SAN network.If you need more informations, please advised and i'll add more.For Mark : Before i can upload a daemon.raw file from my application server, i'll need authorisation from my client.&#xa0; I'll let you know.Thanks and have a good day,Christian CarrierSAN administrator

Holger_Inf · Answer

If the scheduled backup is level incr and on the BU server a nsrindexd -READ (the sql client) in the processlist stuck, than

it may be a problem what I had too. Wo solved it (by 99%) with the update of the BU-Server from 7.4.4 to 7.4.5.Build.784

CarrierC · Answer

Hello,

Thanks for your answer.

I'll like to know how i am suppose to see if there is something stuck in the processlist of the "nsrindexd -READ " command. Will it return some kind of information or logs ?

Thanks, it's a good avenue that need to be looked at.

Christian Carrier

SAN administrator

Holger_Inf · Answer

on unix systems with ps you can see the start time of the nsrindexd -READ process und decide, how long

it run.

Again on unix you may you tusc/strace or anything like this and attach it to the PID.

If you colect the output it may show you

t o o m a n y o p e n f i l e s

repating, the file in case is index record. If you search long enough you may find, that the file names

are looping (over every index record file for this saveset name), eventually you may caclulate the

unix timestamp from the hexcode of the file and find the matching saveset with mminfo. IN my case

the nsrindxexd tried to open very old index record files, which are long obsolete due to subsequent

level full backups. But your case may be different.

CarrierC · Answer

Hi, If anyone have others ideas on the cause of this issue, i'll be really happy to hear from you. Thanks, Christian Carrier San Administrator

rovinabi · Answer

Hi Chris,

This may sound silly but has worked in some cases. Can you try an create a new group in NetWorker and move the client to the new group and see if it behaves the same way? Also, you said that the group hangs at 85% or so, does it hang on index backup or the saveset backup itself??

NetWorker

Was this post helpful?