I was trying to give some background. We have this issue on almost all clients but not re-occuring. If this fault infact occur you can restart the backup and it will be successfull. So by metioning that my goal was to avoid this thread beeing filled with suggestions regarding DNS or other basic troubbleshooting that's already been done.
I don't understand your theory of another process that ends the backup. Then there would be something else that takes the session, ownership of the shadowcopy and then close the connection. A bit far fetched if you ask me.
EMC support not yet provided any usefull solutions for the last two SR's we've created. Mostly they ask for logfiles and no advice is given.
I'd like to know what the exception "BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device" is thrown for and what's beeing done before that so we can troubbleshoot this more. Need to narrow it down more.
Is it SAN-disk, Network latency/packetdrop, SQL server running out of resources, shadowcopy space is low and OS deletes the copy before backup is completed or something else?
So i'm going to add a script to our alert management to set the networker parallelism to 2 instead of 4 if i get this problem. It's not a solution but a possible workaround. Would be nice if it was able to do more logging on this issue to see the real problem but don't know where to start and the D9 level on nsrsqlsv.exe don't show anything interesting.
Would be terrible if it's the OS that can't manage it's own resources properly and need to abort operations just because it's allocated to much resources to one process...
"I don't understand your theory of another process that ends the backup."
I did not mean that "it will take the session". However, it might influence the network behavior/load which will affect the backup indirectly. For example, it could be a SQL maintenance process which uses more RAM than necessary.
I am not a SQL specialist. I don't have to because we export the databases and backup the export files. In fact we have dropped using the SQL module. It worked fine so far but it is backup-admin friendly, not db-admin friendly. So my SQL experience is limited. However, if i see a message like "Operating system error 995" i would first check what it stands for. For Windows 2008 the message is clear:
D:\>net helpmsg 995
The I/O operation has been aborted because of either a thread exit or an application request.
D:\>
Assuming the message code is true then you can search the internet and you may also find this page which refers to the same code with the OS text: Dell Software - Knowledge Base Articles. So this might you/your db-admin give a clue.
bingo.1
2.4K Posts
0
July 24th, 2013 09:00
I don't understand: "We have this issue on aprox 100-300 SQL machines each month and it's never re-occuring."
Does this mean it will only occur once for each client? - per month?
"If you re-run the backup it will complete."
This could points to another process that runs at the same time but has ended later. A scheduled task?
I do not expect a NetWorker timeout issue as it will timeout after 30 mins.
Talk to EMC support. If there is a server where the appearance rate is higher i would suggest that you switch on debug mode for more info.
petterssonnl
4 Posts
0
July 25th, 2013 00:00
I was trying to give some background. We have this issue on almost all clients but not re-occuring. If this fault infact occur you can restart the backup and it will be successfull. So by metioning that my goal was to avoid this thread beeing filled with suggestions regarding DNS or other basic troubbleshooting that's already been done.
I don't understand your theory of another process that ends the backup. Then there would be something else that takes the session, ownership of the shadowcopy and then close the connection. A bit far fetched if you ask me.
EMC support not yet provided any usefull solutions for the last two SR's we've created. Mostly they ask for logfiles and no advice is given.
I'd like to know what the exception "BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device" is thrown for and what's beeing done before that so we can troubbleshoot this more. Need to narrow it down more.
Is it SAN-disk, Network latency/packetdrop, SQL server running out of resources, shadowcopy space is low and OS deletes the copy before backup is completed or something else?
petterssonnl
4 Posts
0
July 25th, 2013 02:00
I might have missunderstood you regarding the process. I get that there is something abort the backup and that is the root cause for the problem.
I've done som more searching today and found a symantec KB stating that the server is running out of resources and abort the the process itself.
http://www.symantec.com/business/support/index?page=content&id=TECH5970
So i'm going to add a script to our alert management to set the networker parallelism to 2 instead of 4 if i get this problem. It's not a solution but a possible workaround. Would be nice if it was able to do more logging on this issue to see the real problem but don't know where to start and the D9 level on nsrsqlsv.exe don't show anything interesting.
Would be terrible if it's the OS that can't manage it's own resources properly and need to abort operations just because it's allocated to much resources to one process...
petterssonnl
4 Posts
0
July 25th, 2013 02:00
yea, i know that's the issue and we've conclude that too.
bingo.1
2.4K Posts
0
July 25th, 2013 02:00
"I don't understand your theory of another process that ends the backup."
I did not mean that "it will take the session". However, it might influence the network behavior/load which will affect the backup indirectly. For example, it could be a SQL maintenance process which uses more RAM than necessary.
I am not a SQL specialist. I don't have to because we export the databases and backup the export files. In fact we have dropped using the SQL module. It worked fine so far but it is backup-admin friendly, not db-admin friendly. So my SQL experience is limited. However, if i see a message like "Operating system error 995" i would first check what it stands for. For Windows 2008 the message is clear:
D:\>net helpmsg 995
The I/O operation has been aborted because of either a thread exit or an application request.
D:\>
Assuming the message code is true then you can search the internet and you may also find this page which refers to the same code with the OS text: Dell Software - Knowledge Base Articles. So this might you/your db-admin give a clue.