Raid recommendations 5 or 1/0?

Question

I heard today EMC recommends a 5 disk raid five set as their fastest drive arrangement, I was told this was preferred over an 8 or 10 disk 1/0 set. Yes this if for a SQL environment. Does anyone have any input on this. I thought the 4 write penalty was why I would want to stay away from raid 5 for the SQL data and go with a 1/0 instead. Any suggestions are welcome.

RRR · Answer

Do the math.in a 5RAID5 setup (4+1 means the 5RAID5) you indeed have a write penaly of 4.In a RAID1 (or RAID10) setup this penaly is only 2, however: you need twice the number of disks to reach the same capacity where 5RAID5 only needs 1.25 times the disk's capacity.1 Read I/O counts as 1 I/O, no matter if it comes from RAID1 or RAID51 Write I/O counts as 2 for R1 and 4 for R5If your application does 750 IOps of which 33% are writes, the total amount of IOps to the disks are: 500 (read) and 250 (write)On RAID5 this means 500 + 4x250 IOps on the backend, meaning 1500 IOps.On RAID1 this means 500 + 2x250 IOps on the backend, meaning 1000 IOps.When you use a 15k spindel wich can handle up to 180IOps, you need:1500/180 = 8,333 spindels in RAID5 (4 data disks per Raid Group, so you'll need 3 Raid Groups) which comes to 15 disks (3 x 5RAID5)1000/180 = 5,555 spindels in RAID1 (rounded up to an even number and then times 2 for RAID10) which comes to 12 disks (12 disk RAID10)So a 12 disk RAID10 performs evenly compared to the 15 disk RAID5.the following step is the capacity needed. How many GB do you need ?A 12 disk RAID10 provides only 6 x disk capacityA 15 disk RAID5 provides 12 x disk capacityIf the amont needed is max 6 disks, you are better off with the RAID10; however if you need up to 12 times the disk capacity, you'd want to start using the RAID5 setup.And what is capacity ? This can be 73, 146, 300GBYou could think about a 12 disk RAID10 made out of 300GB disks or a 15 disk RAID5 setup with 146GB disks. Disk size is no issue in the IOps calculation, speed is.And what if you're using 10k spindels ? IOps of those is only 130, so you might even want to the math again for 10k spindels as well. In my example:When you use a 10k spindel wich can handle up to 130IOps, you need:1500/130 = 11,538 spindels in RAID5 (4 data disks per Raid Group, so you'll need 3 Raid Groups) which comes to 15 disks (3 x 5RAID5)1000/130 = 7,692 spindels in RAID1 (rounded up to an even number and then times 2 for RAID10) which comes to 16 disks (16 disk RAID10)You see what a difference this makes ? For RAID5 the number of spindels stays the same with 10k or 15k spindels, where the RAID1 setup goes from 12 to 16 spindels.I'd suggest you want to make a spreadsheet with the total op IOps the application needs (you can read this value in perfmon in Windows), the number of read and the number of write IOps and then RAID5, RAID10, 10k, 15k, pricing (!!!), capacity needed.... space left in the Clariion (a new DAE also costs money).....I hope I've not been too confusing.(Best practice: RAID10 for the SQL logs and RAID5 for SQL data)

AranH1 · Answer

I think those are all valid points with regards to traditional RAID, but keep in mind this is not direct attached storage but a CLARiiON storage array. The FLARE code on the CLARiiON has been optimized to the point where the RAID5 penalty has been compensated for. See the EMC whitepapers on CLARiiON Fibre Channel storage for more details.

The biggest difference in performance you will see between RAID5 and RAID1/0 is in the rebuild of a drive in the event of a drive failure. RAID5 will have a performance impact from a drive failure, where RAID1/0 will not.

I agree with the assessment that RAID1/0 is still a good target for logs, but for the rebuild issue not just for raw performance. In the case of an OLTP type application though you may need RAID1/0 to get all IOPs you can for your application.

Allen Ward · Answer

I guess the first question really has to be 'What is your performance requirement?'. How many IOPS are you expecting to drive and what is the capacity you need available. Without these answers it is very difficult to provide a recommendation. Yes, RAID1/0 is probably going to give you the best edge, but there is a good possibility you don't need to use it for larger database drives. We only use 1/0 for log drives and all our databases (Oracle, Sybase and SQL) are on RAID 5 (tuned to meet the IO requirements). Anything that needs more goes from the CLARiiON over to the DMX3

RRR · Answer

Anything that needs more goes from the CLARiiON over to the DMX3 Migrating to DMX3 (or DMX4) is always ok Make sure to put enough cache in the DMX (let's say 256 usable GB's or so)

RRR · Answer

The FLARE code on the CLARiiON has been optimized to the point where the RAID5 penalty has been compensated for. See the EMC whitepapers on CLARiiON Fibre Channel storage for more details.

I guess that one has changed then in the last 2 years, since in the Clariion performance workshop they tought us to calculate this way. I know they changed it in DMX, but I wasn't aware that it's also enhanced now in Clariion. Thanks for the extra tip.
So what's the write penalty now ? On DMX it wasn't 4, but 2, and the number of IOps 30% lower per spindel, but in 71 code, it's like you said: write penalties are invisible now

I'm very curious about this subject. In the end you have 2 copies of your data in a RAID1 setup and only 1 in RAID5, so I still believe RAID1 is better in general.... but also more expensive.

RRR · Answer

If you have an IBM compatible host (Windows, Linux, VMware, intel Solaris), you should also remember that aligning your drives almost certainly will enhance your lun performance as well. I've seen up to 20% improvement !!!! (I've heard 1 guy mention a 50% enhancement has been seen, but I wonder if that was some sort of a peak value or a true average). 10 To 20% is realistic, especially with SQL and Exchange.

Allen Ward · Answer

Yes, we loaded up on cache (mainly because of SRDF/A), and managed to get several different projects to pay for it. Each one had specific requirements and they all overlapped a bit to give us significantly more cache than we 'really need' (according to EMC calculations). I believe we are at 192GB, but we are nowhere near the maximum drive capacity of the array.

AranH1 · Answer

The white paper CLARiiON Fibre Channel Storage Fundamentals has an appendix on RAID5 that details the EMC optimizations of RAID5 in the FLARE. I don't know what rev of FLARE they changed it in but it was changed a few revs back.

The document has all the details but in short, they use a Modified RAID3 write (MR3) which makes their implementation of RAID5 comparable to RAID1/0 in terms of performance. The main drawback though will still be the performance impact of a rebuild if a drive fails.

mronquillo1 · Answer

When you use a 15k spindel wich can handle up to
180IOps, you need:
1500/180 = 8,333 spindels in RAID5 (4 data disks per
Raid Group, so you'll need 3 Raid Groups) which comes
to 15 disks (3 x 5RAID5)

How did the requirement go from 9 spindles (rounded form 8.333 to 15 spindles)? The "EMC Clariion Best Practices for Fibre Channel Storage" whitepaper doesn't differentiate between data and parity disks when dividing up the IOPS per disk. Also, RAID5 groups don't necessarily need to be 5 disks for performance. This is from the same whitepaper:

"Sometimes EMC personnel will recommend that only 4+1 or 8+1 RAID 5 groups be bound. This is usually not necessary and is often based on an incomplete understanding of the CLARiiON RAID technology.... Many EMC personnel believe that the CLARiiON RAID optimizations work only with a 4+1 or 8+1 stripe, which is not true -- MR3 can work for any size RAID 5 group."

1000/180 = 5,555 spindels in RAID1 (rounded up to an
even number and then times 2 for RAID10) which comes
to 12 disks (12 disk RAID10)

Again, I don't think you need to double the 6 spindles. Because 1 logical write I/O = 2 physical write I/O, you've already done the math and accounted for the spindles you need.

One last thing. Even though a 15K RPM drive on average is capable of 180 IOPS, the "sweet spot" in terms of response time is at 75-80% of that (135-140 IOPS). So I would use those figures when doing the math.

-merill

RRR · Answer

I was hoping somebody was going to respond to this post, since I wasn't really sure..... from 9 to 15 was because of the data disks in RAID5 (4+1) and since 2 RAID groups only have 8 data disks, I had to round up to the next integer raid groups, being 3 RG's, so 15 disks.

But perhaps you're right.... Like I said, I wasn't too sure anymore about my math. I'm glad somebody reacted, so I can correct the math in my head

ksnell · Answer

To clarify for RRR, yes, you factored in your RAID overhead in the disk IOPs requiired so shouldn't then apply any multiplication factors after calculating drives requried e.g. in your 1500 IOPs, with 15K drives, 1500/180=8.33, thus a 8+1 R5 would be a good fit or 2 * 4+1 R5.

The rule of thumb IOPs numbers per drive, like 180 for a 15K rpm is to estimate required drives when looking for good response time. This is however a rule of thumb approach to sizing and isn't specific to read and write IOP distribution or IO size but applies to a mixed IO and small block sizes. The suggestion of utilizing these at 70% of that figure isn't strictly accurate as the actual 180 IOPs is derived from average response time for a 15K drive, not any kind of saturated test result thus can be used for reasonably accurate sizing.

The cache architecture in the CLARiiON array will then add the following; for reads it will intelligently pre-fetch based on sequentiual access. This is tunable but normally default pre-fetch algorithms are the best approach unless you have some kind of unique application. Even without full sequential access but some locality, or with true sequential streams but multi-threaded, the array will kick off pre-fetch as appropriate, normally resulting in extremely good cache hit rates enhancing read response times. Writes will go through cache unless they exceed the default write aside value of 1MB in size, thus writes to disk will typically be handled by watermark flushing, thus optimized for grouping smaller IO's into larger IO's, coalescing full stripes to perform full stripe writes (MR3 as mentioned in other entries), that all reduce or negate the need to do the full R+R+W+W raid 5 write penalty. Back-filling stripes and LBA ordering are also employed to enhance de-stage performance from write cache to disk.

As previously mentioned in other entries, typically Raid 5 would be used for database files and the logs will use R1/0. In most instances that will do just fine but as those emails suggest, you should determine actual IO requirements and work from there. If you want to reduce spindle count then Raid 5 helps do that but remember that Raid 5 has different capacity utilization, cost and redundancy associated with it. These are typically taken into consideration when choosing what raid type to use.

The suggestion that 4+1 is best employed when using Raid 5 is not a requirment for CLARiiON. There are no special optimizations applied just to 4+1 however, the stripe width does tend to lend itself to easier alignment in some cases i.e. the default stripe size is 4*64=256KB thus 256KB of data needs to be grouped togehter within cach to perform full stripe write for that stripe. If the stripe is wider, you need to group more data together. Also the availability is then reduced slightly as you go wider, as is the impact when a disk failure occurs. That said, it's not uncommon to use wider sets, but for alignment it maybe benefitial to use 4+1 or 8+1 but the optimizations mentioned before are used in all stripe widths, when possible.

I would also suggest reading the CLARiiON Best Practices Guide available from Powerlink.

AranH1 · Answer

Thanks for the input on this subject, I appreciate the insight on the CLARiiON RAID 5 architecture.

RRR · Answer

I've been in DMXs way too long now (not being bothered too much with performance issues), so I just might need have to start doing some catching up on Clariion / Flare related RAID white papers. Performance is always very interesting.

Of course the whole story is "slightly" different when snapshots or clones are active (or even worse: SRDF/S or Mirrorview/S). Maybe we can create a simple new thread on how performance SHOULD be calculated ? Finding the right way in a thread like this one might be confusing, so starting from scratch and only give correct information will probably do a lot of good for the community.

CLARiiON

Was this post helpful?