ksun87

32 Posts

3681

January 31st, 2016 22:00

XDP simultaneous dual drive failure question

Hi everyone,

I read the XDP whitepaper and have a question about the dual drive failure scenario.

Because XDP is 23+2, it seems that we cannot have a dual drive failure simultaneously without data loss.

However, it goes on to say we can have no impact for up to 5 failed SSDs per xbrick so I am a little confused.

Does the dual drive failure mean it happens at the same exact moment? Or does it refer to a single SSD failing and the second one failing while the first is rebuilding ?

Also, what is meant by 5 failed SSDs per xbrick and no impact? This seems to contradict the statement that we cannot sustain dual drive failure simultaneously.

Responses(8)

scotthoward

64 Posts

0

February 1st, 2016 18:00

XDP isn't exactly 23+2, but it's a good simple way to think of it.

The dual parity (+2) means that up to 2 disks can fail at a time - that can mean both fail simultaneously, or that one fails and then the second fails before the rebuild of the first one completes.

Where XDP is very different to RAID is that when a disk fails, it doesn't hot-spare to a dedicated disk, but instead actually re-layouts the data to change the stripe size. So if you're normally operating at 23+2, and one disk fails, then after the rebuild has been completed it'll be a healthy 22+2 layout. If another disk fails, it'll become 21+2. If two more fail simultaneously then after the rebuild it'll be 19+2, and so on - down to a minimum of 18+2 (thus your 5 disks that can fail and still be healthy)

Of course, all of this presumes that there's enough free space on the array to handle the loss of the disks. The first failure has reserved space available (spread out across all of the disks in the X-Brick), so there's no impact to available space. Each subsequent failure will reduce the available space on the array to allow for the hot-spare-space.

In practice, this is all very much academic. SSDs fail far less frequently than spinning disks - the drives in XtremIO have a MTBF of 2 million hours (around 228 years), and we're actually seeing them fail even less frequently than that! The odds of a single disk failure are low - the odds of a dual disk failure are minuscule.

scotthoward

64 Posts

0

February 1st, 2016 18:00

dynamox wrote:

less frequently than 228 years ?

As odd as it sounds, yes! It's all down to statistics - I wrote about it some time back over here.

jgebhart1

14 Posts

0

February 1st, 2016 18:00

I'm curious, have you ever tested the impact to performance of the array when a disk is failed? I did and I saw a significant, multi-second, drop to zero IO while the array sorted itself out.

dynamox

2 Intern

•

20.4K Posts

0

February 1st, 2016 18:00

scotthoward wrote:

In practice, this is all very much academic. SSDs fail far less frequently than spinning disks - the drives in XtremIO have a MTBF of 2 million hours (around 228 years), and we're actually seeing them fail even less frequently than that!

less frequently than 228 years ?

dynamox

2 Intern

•

20.4K Posts

0

February 1st, 2016 20:00

can't get to that link.

what's the probability of a 3+1 RAID5 group experiencing double drive failure ? We had a call with EMC statistician who told us 70 or so years, sounded really "good" as we were recovering NS80 from dual drive failure in the same RG and major file system corruption on datamovers.

scotthoward

64 Posts

0

February 2nd, 2016 13:00

jgebhart wrote:

I'm curious, have you ever tested the impact to performance of the array when a disk is failed? I did and I saw a significant, multi-second, drop to zero IO while the array sorted itself out.

It really depends on the nature of the failure. Once the disk is lost the array will handle it immediately and with no impact - but the problem is that it might take a short period to actually realize that the disk has been lost.

For a true disk failure, in most cases the disk itself will still be reachable, but will return errors to the array so the array knows very quickly that the disk dead and can route around it with minimal to no impact.

However if the disk fails in a way that causes it to stop responding to the array (such as a complete failure of the disk, or the disk being physically removed from the array) then there will be a timeout before the array realizes the disk is no longer there, and in this case you might see a few seconds of no IO. Thankfully for a real failure this is the least common form of failure - although it's also the most common form of "simulated" failure (ie, physically removing a disk from the array).

ksun87

32 Posts

0

February 2nd, 2016 14:00

Thank you scotthoward!

Just to clarify.. the "rebuild" refers to when XIO is actively changing the stripe size after disk failure. So, i can have 2 fail simultaneously resulting in a 21+2 but as long as a third doesn't fail while the rebuild / re-stripe is happening, no data loss will occur.

Is this rebuild a long procedure similar to a classic disk rebuild? Or is this a somewhat instantaneous event as data is moved around to accommodate the new stripe size?

K

Kumar_A

727 Posts

1

February 8th, 2016 21:00

Yes - if two drives fail at the same time (or the second one fails while the first one is rebuilding), you would have a stripe size of 21+2 for the new data that is coming in. As long as a third drive does not fail during this rebuild process, there is no data loss. However, once the rebuild process has finished - you can potentially lose another SSD, and you would still be protected and an automatic rebuild process starts up (assuming you have additional space available in the X-Brick to accommodate the new rebuild).

The rebuild process leverages some of the unique benefits that XDP technology provides (more reads from DRAM saves on time). Additionally, the rebuild process happens in parallel across all the remaining SSDs in that X-Brick, and we need to rebuild only the user data portion of the failed SSD instead of the whole SSD capacity. Both these features accelerate the rebuild process significantly. The actual rebuild time will depend on the amount of data that needs to be rebuilt and also the current workload on the array.

View All

No Events found!