I thank you for your response and will add below more informations as asked. (Sorry if it was not enough descriptive at first)
Ok so all our SQL servers are Windows 2003 R2 SP2 (x86 or x64)
All of these are working with the nsr client version 7.4.4 (nsrexecd ???)
Nope this issue is only seen with SQL servers, we have a lot of other system in the infrastructure and none give us these kind of problem.
The migration had started at the beginning of January 2010, so SQL configuration clients has been done in the following weeks or month from that date. So we can say that they've been configured since a copple # of months.
From my point of view there has not been major change to the servers of the networking envirroment that could be causing this problem. That problem was there at the beginning and was just not seen as often as now, since there was much less servers configured.
On the network side, we had them check if there was any ports problem, passing trough firewall and all, nothing apparant was detected. We obtain the problem event with server in the internal zone (no firewall)
One more point, as stated earlier, we are in the process of migrating from a Tivoli envirroment, so i remarked the SQL servers have at the same time the SQL module for Tivoli and Networker. From the problem saw with Oracle, we though that maybe the same issue would affect the SQL (still need to be validated)
Now i'll explain a bit how we are working for the SQL backups (Architecture)
SN : Storage Node
DSN : Dedicated Storage Node
EDL : EMC disk library
*** All backup are schedule with third party software ***
The way it's done is as follow :
With the third party software, we have planned a SQL backup every 30minutes. here and there these days, the group complete at 90-95% and never fail, holding the next backup, one client stay stuck at the top, never to start and never finish the group. All the clients in those grp send their meta data to a SN by the LAN or are using there own virtual drives(DSN) to backup their data. The SN or the client DSN are linked to an EDL4206 and store their meta data there. The SN or DSN are not passing trough the LAN but only by the SAN network.
If you need more informations, please advised and i'll add more.
For Mark : Before i can upload a daemon.raw file from my application server, i'll need authorisation from my client. I'll let you know.
I'll like to know how i am suppose to see if there is something stuck in the processlist of the "nsrindexd -READ " command. Will it return some kind of information or logs ?
Thanks, it's a good avenue that need to be looked at.
This may sound silly but has worked in some cases. Can you try an create a new group in NetWorker and move the client to the new group and see if it behaves the same way? Also, you said that the group hangs at 85% or so, does it hang on index backup or the saveset backup itself??
PeterBen
120 Posts
0
June 17th, 2010 07:00
poor description. os/sql version? architecture? any tasks on the sql running which can interrupt the backups?
have you checked the applogs on the client for any errors? event viewer errors?
Mark_Bellows
240 Posts
0
June 17th, 2010 11:00
Good afternoon!
I need a little more information before I can help with this issue.
What is the operating system?
What is the clients nsrexecd version?
Are you having this issue with any other servers?
How long has this client been part of the back up environment?
Have there been any recent changes to the Operating System or network switches?
Last one! What is the connection to the tape library?
Also, if you can attach the deamon.raw file from the client system to a post so I can look at it, I may be able to help you narrow down the problem.
Thank you!
Mark
CarrierC
12 Posts
0
June 29th, 2010 12:00
Hello Peter and Mark,
I thank you for your response and will add below more informations as asked. (Sorry if it was not enough descriptive at first
)
Ok so all our SQL servers are Windows 2003 R2 SP2 (x86 or x64)
All of these are working with the nsr client version 7.4.4 (nsrexecd ???)
Nope this issue is only seen with SQL servers, we have a lot of other system in the infrastructure and none give us these kind of problem.
The migration had started at the beginning of January 2010, so SQL configuration clients has been done in the following weeks or month from that date. So we can say that they've been configured since a copple # of months.
From my point of view there has not been major change to the servers of the networking envirroment that could be causing this problem. That problem was there at the beginning and was just not seen as often as now, since there was much less servers configured.
On the network side, we had them check if there was any ports problem, passing trough firewall and all, nothing apparant was detected. We obtain the problem event with server in the internal zone (no firewall)
One more point, as stated earlier, we are in the process of migrating from a Tivoli envirroment, so i remarked the SQL servers have at the same time the SQL module for Tivoli and Networker. From the problem saw with Oracle, we though that maybe the same issue would affect the SQL (still need to be validated)
Now i'll explain a bit how we are working for the SQL backups (Architecture)
SN : Storage Node
DSN : Dedicated Storage Node
EDL : EMC disk library
*** All backup are schedule with third party software ***
The way it's done is as follow :
With the third party software, we have planned a SQL backup every 30minutes. here and there these days, the group complete at 90-95% and never fail, holding the next backup, one client stay stuck at the top, never to start and never finish the group. All the clients in those grp send their meta data to a SN by the LAN or are using there own virtual drives(DSN) to backup their data. The SN or the client DSN are linked to an EDL4206 and store their meta data there. The SN or DSN are not passing trough the LAN but only by the SAN network.
If you need more informations, please advised and i'll add more.
For Mark : Before i can upload a daemon.raw file from my application server, i'll need authorisation from my client. I'll let you know.
Thanks and have a good day,
Christian Carrier
SAN administrator
Holger_Inf
1 Rookie
•
122 Posts
0
June 29th, 2010 23:00
If the scheduled backup is level incr and on the BU server a nsrindexd -READ (the sql client) in the processlist stuck, than
it may be a problem what I had too. Wo solved it (by 99%) with the update of the BU-Server from 7.4.4 to 7.4.5.Build.784
CarrierC
12 Posts
0
June 30th, 2010 06:00
Hello,
Thanks for your answer.
I'll like to know how i am suppose to see if there is something stuck in the processlist of the "nsrindexd -READ " command. Will it return some kind of information or logs ?
Thanks, it's a good avenue that need to be looked at.
Christian Carrier
SAN administrator
Holger_Inf
1 Rookie
•
122 Posts
0
June 30th, 2010 08:00
on unix systems with ps you can see the start time of the nsrindexd -READ process und decide, how long
it run.
Again on unix you may you tusc/strace or anything like this and attach it to the PID.
If you colect the output it may show you
t o o m a n y o p e n f i l e s
repating, the file in case is index record. If you search long enough you may find, that the file names
are looping (over every index record file for this saveset name), eventually you may caclulate the
unix timestamp from the hexcode of the file and find the matching saveset with mminfo. IN my case
the nsrindxexd tried to open very old index record files, which are long obsolete due to subsequent
level full backups. But your case may be different.
CarrierC
12 Posts
0
July 29th, 2010 06:00
Hi,
If anyone have others ideas on the cause of this issue, i'll be really happy to hear from you.
Thanks,
Christian Carrier
San Administrator
rovinabi
55 Posts
0
July 29th, 2010 13:00
Hi Chris,
This may sound silly but has worked in some cases. Can you try an create a new group in NetWorker and move the client to the new group and see if it behaves the same way? Also, you said that the group hangs at 85% or so, does it hang on index backup or the saveset backup itself??