Unsolved
This post is more than 5 years old
51 Posts
0
395382
T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14
Running Windows Server Backup causes the following disk errors, we have a Raid 1+0 (4 disks) array on a Perc 310 controller:
- EventID: 153 - System, disk
The IO operation at logical block address ef0 for Disk 1 (PDO name: \Device\00000044) was retried. - EventID: 140 - System, Microsoft-Windows-Ntfs
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: C:,
DeviceName: \Device\HarddiskVolume7.
(The I/O device reported an I/O error.) - EventID: 14 - System, volsnap
The shadow copies of volume C: were aborted because of an IO failure on volume C:.
DELL-Geoff P
990 Posts
0
March 18th, 2014 08:00
NickC_UK,
I found a good article concerning the event 153 errors: http://blogs.msdn.com/b/ntdebugging/archive/2013/04/30/interpreting-event-153-errors.aspx
You may have a disk that needs to be replaced; I would run diagnostics on the drives just to confirm.
EventID: 140, http://social.technet.microsoft.com/Forums/en-US/85dda2c8-485a-45f1-b438-80720fb10a7e/ntfs-warning-id-50-id-140-user-profile-disk?forum=winserverTS also points to a hard drive issue.
EventID: 14 - System, volsnap; looks like this follows suit with the previous 2 errors. Since it detects errors on the drive, it aborts the write the shadow copies.
Regards,
NickC_UK
51 Posts
0
March 18th, 2014 11:00
There are brand new SAS disks. If there was a problem surely the Raid controller would have detected them wouldn't it?
How do I check individual disks when they are part of a Raid array?
DELL-Geoff P
990 Posts
0
March 18th, 2014 11:00
You can use the online diagnostic package that will test the drives individually. They can be found here:
http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-t420?driverId=TRWYD
Regards,
NickC_UK
51 Posts
0
March 19th, 2014 09:00
Thanks Geoff I have downloaded that and installed update to 'Dell 64 Bit uEFI Diagnostics', but how do I run this online. Also what happens to the raid array if I test one disk offline, I assume that will then need to be rebuilt will it?
NickC_UK
51 Posts
0
March 20th, 2014 04:00
In the meantime we have run a Disk Consistency check on the Raid array which failed with:
The Check Consistency found inconsistent parity data. Data redundancy may be lost.: Virtual Disk 0 (Virtual Disk 0) Controller 0 (PERC H310 Adapter)
Raid controller is now resynching, been running for about 17 hrs and is 75% of the way through.
Issue is that the Raid controller was reporting that all disks were operating fine, without doing the the Raid controllers Disk Consistency check we wouldn't have known there was a problem. This is a brand new server, how can it be that the Raid array is not in sync? Would it not have been synchronised before it left the factory?
NickC_UK
51 Posts
0
March 20th, 2014 15:00
This problem is being caused by Windows Server Backup trying to backup from the hyper-V host partition. Backup from a virtualised server works fine, just the hyper-V host that has the problem.
We have also identified that this only happens when backing up to a non-raid disk in the same disk chassis as the raid array disks. Backup to an external disk and all works fine.
Backup fails as follows and then leaves the Raid array corrupted!
Backup failed as shadow copy on source volume got deleted. This might caused by high write activity on the volume. Please retry the backup. If the issue persists consider increasing shadow copy storage using 'VSSADMIN ShadowStorage' command.
EventID: 140 – System, Microsoft-Windows-Ntfs
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: V:, DeviceName: \Device\HarddiskVolume9.
(The I/O device reported an I/O error.)
EventID: 153 - System, disk
The IO operation at logical block address c60 for Disk 0 (PDO name: \Device\00000043) was retried.
EventID: 157 - System, disk
Disk 2 has been surprise removed.
EventID: 517 – Application, Microsoft-Windows-Backup
The backup operation that started at '2014-03-20T15:37:24.984150300Z' has failed with following error code '0x8007045D' (The request could not be performed because of an I/O device error.). Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.
This is obviously an incompatibility between the Perc 310 Raid controller and Windows Server Backup 2012 R2.
Any known fixes?
DELL-Geoff P
990 Posts
0
March 21st, 2014 07:00
You will need to increase the VSS cache size for backups. The error that is occurring Backup failed as shadow copy on source volume got deleted. This might caused by high write activity on the volume. Please retry the backup. If the issue persists consider increasing shadow copy storage using 'VSSADMIN ShadowStorage' command.) Is that the data that is stored in the cache before it writes it to the disk is being deleted before it can be written to the backup location. Because the data is being deleted before it is written & acknowledged that it has been written you get what is known as dirty cache & when the cache buffer fills up it does a force flush & that is suppose to flush out all the old data that has been written to the drive but for some reason it is flushing most or all the data in cache & not checking to just flush the old cache data.
Let us know how it works.
NickC_UK
51 Posts
0
March 22nd, 2014 15:00
Just found a message elsewhere which again was on a Dell Perc 310 controller. As no one else other than Dell Perc 310 owners are reporting this problem this strongly suggests it is down to the Perc 310 or its driver. Are there any driver or firmware updates in the line which might cure this?
http://serverfault.com/questions/566591/windows-server-backup-keeps-failing-after-upgraded-my-os-from-windows-server-201/583909#583909
Rgds,
NickC_UK
51 Posts
0
March 22nd, 2014 15:00
All drives set to unlimited space for Shadow Copies. Latest event log errors below. It seems the root of the problem is that VSS is causing 'Disk 3 has been surprise removed', any idea why that is happening?
Rgds,
Nick
Disk 3 has been surprise removed.
The IO operation at logical block address 00 for Disk 1 (PDO name: \Device\00000044) was retried.
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: C:, DeviceName: \Device\HarddiskVolume7. (The I/O device reported an I/O error.)
The shadow copies of volume C: were aborted because of an IO failure on volume C:.
Fault bucket , type 0
Event Name: Windows Server Backup Error
Fault bucket , type 0
Event Name: Windows Server Backup Error
The backup operation that started at '2014-03-22T16:44:46.777575300Z' has failed to back up volume(s) 'C:,RECOVERY,X:'. Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.
NickC_UK
51 Posts
0
March 25th, 2014 04:00
Hi Geoff,
Have now set Shadow Copies Limit to approx 20% of max size for all disks and volumes.
Not sure that has helped as we seem to be getting the following error more often now:
EventID: 157 - System, disk
Disk n has been surprise removed.
Just a thought but most of these virtual disks are dynamically resizing .vhdx which don't yet have a lot of data written to them. Could it be that VSS is trying to use this limit but the Vdisks have not been resized to have that much space yet?
Nick
NickC_UK
51 Posts
0
March 25th, 2014 09:00
Have been doing more testing and the source problem is eventid: 157 - Disk n has been surprise removed.
Why is this happening?
I have enabled 'Disk' event logging but no events seem to get written into those logs. How can we establish why this disk is being surprise removed.
Log Name: System
Source: disk
Event ID: 157
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Description:
Disk 2 has been surprise removed.
Event Xml:
157
3
0
0x80000000000000
20275
System
\Device\Harddisk2\DR4
2
0000000002003000000000009D000480000000000000000000000000000000000000000000000000
Followed by a whole load of:
Log Name: System
Source: disk
Event ID: 153
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Description:
The IO operation at logical block address d90 for Disk 1 (PDO name: \Device\00000044) was retried.
Event Xml:
153
3
0
0x80000000000000
20276
System
\Device\Harddisk1\DR1
d90
1
\Device\00000044
0F0104000400340000000000990004800000000000000000000000000000000000000000000000000028042A
Disk 2 has been surprise removed.
NickC_UK
51 Posts
0
March 27th, 2014 10:00
It seems that the "EventID: 157 - System, disk, Disk n has been surprise removed" error is not the problem many others have seen this elsewhere.
The error that is the real problem is:
EventID: 153 - The IO operation at logical block address d90 for Disk 1 (PDO name: \Device\00000044) was retried.
The disk drive has now been tested in the same senario in a spare HP server, also running 2012 R2, and WSB backs-up to it fine so this is strongly looking like a problem specific to this server or the Perc 310 controller.
NickC_UK
51 Posts
0
April 3rd, 2014 05:00
This has now become a lot more serious, been testing out backups to a completely different USB attached SSD drive and that has now suffered from the same problem:
Log Name: System
Source: disk
Event ID: 153
Task Category: None
Level: Warning
Keywords: Classic
Description:
The IO operation at logical block address 8 for Disk 8 (PDO name: \Device\0000008d) was retried.
robertps73123
2 Posts
0
May 8th, 2014 02:00
robertps73123
2 Posts
0
May 8th, 2014 02:00