NickC_UK
2 Iron

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

Hi Geoff,

Have now set Shadow Copies Limit to approx 20% of max size for all disks and volumes.

Not sure that has helped as we seem to be getting the following error more often now:

EventID: 157 - System, disk
Disk n has been surprise removed.

Just a thought but most of these virtual disks are dynamically resizing .vhdx which don't yet have a lot of data written to them.  Could it be that VSS is trying to use this limit but the Vdisks have not been resized to have that much space yet?

Nick

0 Kudos
NickC_UK
2 Iron

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

Have been doing more testing and the source problem is eventid: 157 - Disk n has been surprise removed.

Why is this happening?

I have enabled 'Disk' event logging but no events seem to get written into those logs.  How can we establish why this disk is being surprise removed.

Log Name:      System
Source:        disk
Event ID:      157
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Description:
Disk 2 has been surprise removed.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="disk" />
    <EventID Qualifiers="32772">157</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2014-03-25T14:59:51.106657500Z" />
    <EventRecordID>20275</EventRecordID>
    <Channel>System</Channel>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Harddisk2\DR4</Data>
    <Data>2</Data>
    <Binary>0000000002003000000000009D000480000000000000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>

Followed by a whole load of:

Log Name:      System
Source:        disk
Event ID:      153
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Description:
The IO operation at logical block address d90 for Disk 1 (PDO name: \Device\00000044) was retried.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="disk" />
    <EventID Qualifiers="32772">153</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2014-03-25T14:59:57.505200300Z" />
    <EventRecordID>20276</EventRecordID>
    <Channel>System</Channel>
    <Security />
  </System>
  <EventData>
    <Data>\Device\Harddisk1\DR1</Data>
    <Data>d90</Data>
    <Data>1</Data>
    <Data>\Device\00000044</Data>
    <Binary>0F0104000400340000000000990004800000000000000000000000000000000000000000000000000028042A</Binary>
  </EventData>
</Event>

Disk 2 has been surprise removed.

0 Kudos
NickC_UK
2 Iron

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

It seems that the "EventID: 157 - System, disk, Disk n has been surprise removed" error is not the problem many others have seen this elsewhere.

The error that is the real problem is:
EventID: 153 - The IO operation at logical block address d90 for Disk 1 (PDO name: \Device\00000044) was retried.

The disk drive has now been tested in the same senario in a spare HP server, also running 2012 R2, and WSB backs-up to it fine so this is strongly looking like a problem specific to this server or the Perc 310 controller.

0 Kudos
NickC_UK
2 Iron

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

This has now become a lot more serious, been testing out backups to a completely different USB attached SSD drive and that has now suffered from the same problem:

Log Name:      System
Source:        disk
Event ID:      153
Task Category: None
Level:         Warning
Keywords:      Classic
Description:
The IO operation at logical block address 8 for Disk 8 (PDO name: \Device\0000008d) was retried.

0 Kudos
robertps73123
1 Copper

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

◦EventID: 519 - Application, Microsoft-Windows-Backup The backup operation that started at '‎2014‎-‎03‎-‎22T16:44:46.777575300Z' has failed to back up volume(s) 'C:,RECOVERY,X:'. Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.. I have researched it for 3 days, Microsoft kept Recovery partition at 300 MB so they made changes to WinPE Recovery partition, I cannot get anyone to tell me what or how they did it. The work around is to give recovery 400 MB using diskpart and disk administrator. I have eval copies and the backups are flawless and the Recovery partition is 300 MB. Until they get a fix that does not hack up the partitions on the boot drives I'm using reagentc /disable to run the backup which skips the Recovery partition and when the backup is finished use reagentc /enable, each from an elevated command prompt. Robert
0 Kudos
robertps73123
1 Copper

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

Go ahead an laugh, I was having the same thing, my fault was in the power options on the USB Power tab. Robert
0 Kudos
Chris.G
2 Bronze

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

I realize this is a year old issue, but considering it's still a top Google result for this very specific issue, I figured I'd share my resolution. 

I updated the H310 firmware from:

20.12.1-0002 -> 20.13.1-0002

And updated the RAID driver from:

6.6(something) -> 6.803.21.00

And the issue was immediately resolved following reboot. 

I am using SATA drives in the machine for backups. USBs worked fine. The SATAs failed as soon as I added them. Same exact event IDs as the OP which were indicating a drive failure, however the problem was more likely related to reading from a drive on the RAID controller while writing to another drive on the same controller. 

Not to say that the firmware or driver updates contained something magical. Perhaps there was some corruption in the original firmware or drivers that were corrected by the update, but everything is good now. 

Unfortunately it took forever to find this post and start troubleshooting the controller, instead of the OS and drives themselves. 

Hope someone else will benefit from this. 

NickC_UK
2 Iron

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

Good to know there is finally a solution to this, just a shame Dell didn't admit to the fault and release the fix earlier.  Too late for us as this Server is no-longer in production, can't have production servers with faults like this.  The H310 firmware update has only been available since 03 Apr 2015.

Lets hope your solution prevents others having to throw away a new server like we did.

0 Kudos
KyleNguyen
1 Copper

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

I'm having the exact same issues as the OP. I'm following your tip and updating the firmware and drivers. Hopefully, your solution works for me as well. Thanks so much! 

0 Kudos
Highlighted
lfeldman
1 Nickel

RE: T-420 server 2012 R2 backup causes disk failure eventIds: 153/140/14

This is BAD.

Allowing the controller to resync from inconsistent parity WILL corrupt data.

What you're describing overall sounds like multiple drive failure.

Making the assumption that the drives are new doesn't rule that out either since they can easily be DOA.

When drives start popping in and out of the array like that, you cannot rely on that array being stable.

0 Kudos