R5,R6 and parity

Question

Greetings,

It could be a general discussion and not biased for a specific model. It's all about RAID.

In a VNX if i create a pool with the following disks:

-Extreme performance: 5 disks RG 5(4+1)

-performance: 5 disks RG 5(4+1)

-capacity: 8 disks RG 6(6+2)

The above configuration has no hot spare configured.

In case of RAID 5: one disk failed. Does the host will continue accessing its data(access time will be impacted).or will be facing some sort of data unavailability till replacing the faulty disk and rebuild the RAID.

In case of RAID 6 and same scenario (one disk failed) what is the outcome??

Thanks,

IaKK83AaCI12227 · Accepted Answer

Keep in mind that in the case of both RAID-5 and RAID-6 parity is distributed across all disks in the disk group.

As a simple picture for your (4+1) RAID-5 let's visualize the following where "D" represents data and "P" represents parity on any one of the disks in the 5-disk group;

Disk #

1 2 3 4 5

D D D D P

P D D D D

D P D D D

D D P D D

D D D P D

In the case of a single disk failure, all data will still be available to the host/application. In the representation above, let's assume that disk #5 fails. If the host/application has to read data from the first row, since only the "P" parity was lost. all existing "D" data is still available. If the host/application has to read data from the second through the 5 rows, because "D" data was lost due to the drive failure, the remaining "D" data from the row and the associated "P" parity is used to recreate the lost "D" data on the fly... So while the response time experienced may be slightly longer due to the on-the-fly calculation, all user data is still available for host/application access...

In the case of RAID-6 (6+2), the same basic idea will apply, except there will be double "P" parity available...

Another simple picture shown below:

Disk #

1 2 3 4 5 6 7 8

D D D D D D P P

P D D D D D D P

P P D D D D D D

D P P D D D D D

D D P P D D D D

D D D P P D D D

D D D D P P D D

D D D D D P P D

In the case of a single disk failure, the same "basic" logic will apply and all data will still be available to the host/application because both "D" data and "P" parity are available for the any lost data to be reconstructed, as required.

In both scenarios, once the failed disk is replaced, the existing "D" and/or "P" parity is used to rebuild what was lost. In some implementations, there is also a "diagonal" approach used to minimize the amount of read operations that need to occur for better overall efficiency, but I will not attempt to explain this approach further... I'm sure there are much better write-ups available on the Internet if you need further details...

Hussam_Sawaqed · Answer

Perfect, thank you.

R5 can tolerate 1 disk failure. In the event one disk failed.how the host can access the data was lost from the failure disk.

Now insteade of (4+1) we have only 4 disks. How the date can be retrieved from and stored in the intended disks. Any performance issue will occur as with each read request the missing block will calculated from others??

Thanks,

brettesinclair · Answer

For RAID 5 it can tolerate 1 drive failure without affecting availability of data. Subsequent failure will result in data loss.

For RAID 6, with 2 parity disks, 2 failures can be tolerated without affecting availability of data. As above, any further failure will result in dataloss.

brettesinclair · Answer

With RAID 5, the parity info is distributed across all the drives in the group. If one drive fails, the parity data is read/assembled from the surviving drives, so all data is intact from the host perspective.

While the group has a failed drive, there will be a performance impact, up until you replace it and a rebuild completes.

Hussam_Sawaqed · Answer

Excellent,thank you so much.

Now i understand how R5 and R6 will handle read requests from the host in the event of any disk failure however could you please clarify the write requests in both cases.

R5(4+1) and one disk failed;how the new data to be stored is going to be distributed over the surviving disks.

1 2 3 4 5

D D D D P

D D D P D

D D P D D

D P D D D

P D D D D

Assume disk 5 failed. The new data will be distributed over remaining disks(1,2,3,4) but where is the parity in such case. Is the storing algorithm will be going like this without parity:

1 2 3 4

Old data

D D D P

D D P D

D P D D

P D D D

New data

D D D D

Many thanks,

IaKK83AaCI12227 · Answer

I suspect the more important question, is whether or not your array has an available spare disk to initiate a rebuild operation...

Hussam_Sawaqed · Answer

you suspect

if you see someone asks if the configured RAID level is doing its job properly then you have to make sure that no more spare disks are available

the hot spare disks will be placed in the box sooner or later however i just wanted to understand RAID behavior in the event of any disk failure.

Could you please describe the write request after one disk is failed in R5.

IaKK83AaCI12227 · Answer

Yes, I always have to suspect, because more often than not I cannot know where you are going with your questions, nor do I know all the details about the specific environment...

I would also suggest that utilizing RAID-5 as a data protection method goes hand-in-hand with the expectation that a failed disk will not be replaced with a "sooner or later" mentality, because the possibility of data loss has significantly increased while you are in a degraded mode of operation. If the appropriate spares are not readily available, and the ability to get them may be delayed, then the necessity of RAID-6 should be seriously considered...

Regardless, I better understand your current interest and I will offer you how today's VNX storage array with MCx handles the situation by utilizing "Rebuild Logging"... The following write-up was taken from the VNX MCx Multicore Everything white paper, specifically page 49.

++++++++++++++++++++++++++++++++++++

When an actively used VNX drive goes offline (for example, it fails, is pulled out of the slot, or is temporarily unresponsive due to some other reason), Multicore RAID enables a RAID protection mechanism called Rebuild Logging. The RAID group itself is said to be in a degraded state when one of its drives is missing. This is true for all redundant [1] RAID types.

Rebuild logging marks every data block that should have been updated on a missing RAID member. The marking is stored as a block map as part of the RAID group metadata. RAID metadata is stored within the RAID group disks themselves. Think of RAID metadata as a small hidden LUN at the bottom range of the RAID group.

Figure 44 -- Rebuild logging

Figure 44 -- Rebuild logging shows a Full RAID Stripe being built in cache (including parity). However, since the RAID group is degraded (member 3 is missing), the data meant for member 3 is discarded in the process of writing the full stripe to the disks (commonly known as Multicore Cache flush). The RAID group metadata (shown as RAID Log in the graphic) will be updated with a record of the member 3 blocks that should have been updated as part of a full stripe write.

For a partial stripe write, the logic remains similar. Parity is calculated based on the full stripe information (missing blocks are read from the disks). However, since member 3 is missing, parity will not be valid after the drive comes back online. Therefore, the log is updated with the location of the parity blocks. After the drive is back online, Multicore RAID simply rebuilds the blocks marked in the log, and turns off rebuild logging.

Rebuild logging is active for the entire duration that a RAID group is in a degraded mode. If an array does not invoke a hot spare after 5 minutes (for example, when there aren't unbound compatible drives present in the system), rebuild logging continues to run until the RAID group is brought offline or a suitable drive is found.

The log is a bitmap. That is created during the RAID group creation as part of the metadata area. The size of the RAID group overhead is defined by this log. Therefore, it cannot run out of space. The log is persistent, a rebuild process interrupted by a reboot continues after the power is restored.

Rebuild logging enables VNX to avoid full RAID group rebuilds. With it, VNX can re-use good data from a drive that was temporarily inactive, significantly reducing the need to rebuild entire drives.

[1] VNX supported redundant RAID types are: RAID6, RAID5, RAID3, RAID1, and RAID10

+++++++++++++++++++++++++++++++++++++

If you want to better understand some of the internal workings of the VNX with MCx, I would highly recommend you read the above referenced white paper which readily available from EMC.

Best wishes!

VNX

R5,R6 and parity

Was this post helpful?

Register Now! for Secure Secrets with Kubernetes & DevSecOps - and grab a lab!

Dell Technologies at SpiceWorld 2022