Start a Conversation

Unsolved

This post is more than 5 years old

EK

1987

September 1st, 2010 06:00

RAID level for SATA 1TB on CX4-120

Hello,

I have acquired a cx4-120 for backup purpose. It has 35 SATA II 1TB in 3 DAE.

I've never deal with SATA up to now. Only FC drives. I'm wondering I hope reaching 200MB/s of write and around 300 MB/s of read both sequential.

I'm wondering what the best is for RG.

Having two 15 drives RG in RAID5 (1 LUN per RG), like that I don't have my RG across DAE? This left me with 5 hard to use SATA drives (of course I'll setup 1 or 2 HS out of these). I don't know if it's recommended to have a RG with drives across DAE. This configuration gives me 12840.52 GB per LUN.

On the other hand I could have 2 RG of 16 drives in RAID6 (still 1 LUN per RG). But this time I've RG across DAE. I also have more spindles per RG but 10% less perfs for writes than the same in RAID5. I also have a better protection than RAID5. This conf would lead me to 32 bounded drives and 3 left (I'll still setup 1 or 2 HS out of these).

I'm trying to have RG with a lot of drives for max IO.

What are you thinking of me situation. What would be your suggestion of a good RG configuration?

Thank you in advance for your answers.

BR

Eric

392 Posts

September 1st, 2010 08:00

For backup with SATA drives, using RAID 6 would be prudent.

I note you have not specified many TB of storage you require.  I'll assume your backup is a small percentage of whats available. I recommend you create 3x (8+2) RAID groups. The (8+2) is regularly used for backup-type RAID groups with backup. These RAID groups exceed your bandwidth requirement.   Assign one drive as a hot spare.  Use the other four drives as scratch drives.

Splitting across DAEs on a CX4-120 has no effect.  It only has a single bus.

Sizing of RAID groups, Bus Balancing, and SATA Drive Characteristics, as well as recommendations on RAID group provisioning can be found in the EMC CLARiiON Best Practices for Performance and Availability. In particular, the 'Storage System Sizing and Planning' section at the end is recommended reading.

261 Posts

September 1st, 2010 08:00

Hello Eric, thanks for the question.

Sorry to say, but there is no easy answer to this question. After reading your post I only have more questions.

First  off, how much space do you actually need? This question may answer what  raid type to use (example being that in Raid 6 you lose 2 drives to  parity over 1 in Raid 5, or losing 1/2 of your drives in a Raid 1/0  configuration). If you can't afford to lose more than x drives, then  certain RAID types can be eliminated from a possible solution.

Next, you need to figure out if you bandwidth hopes are realistic. To aid in this I point you to page 71 in the EMC Best Practices Guide (Flare 29.0 version). The charts there show what total throughput and bandwidth values were seen under testing (specific IO load and tests of course). If you IO  size will be on the low side, then large bandwidth numbers may or may  not happen. Being sequential with certainly help with this.

Are you backing up many luns/drives?  At the same time? Is there a window where this needs to get done? Is  there a write window to this array and then a read window or are they  overlapped? These questions in my opinion will decide how many groups to  go with. If you need to back up multiple luns/drives at the same time, then what may be sequential IO to the front end will turn into somewhat random IO  on the drives if the devices are on large raid groups (drive heads keep  cycling between two or more spots on the drives because of the front  end load. This drives up service times and drives down the amount of  throughput/bandwidth possible.) If you have multiple devices to back up  at the same time, then more groups of smaller disk counts will be  needed.

Another thing to consider is rebuild  times. Eventually you will see drive failures, but if you go with very  large groups the rebuild times can be huge depending on the rebuild rate  you choose. Raid 6 will help with fault tolerance, but the rebuild  impact on performance could be substantial.

I hope these questions/topics help in designing a  solution for you. The performance rabbit hole can get very deep quickly  depending on how many things you are trying to control.

-Ryan

September 3rd, 2010 05:00

Hello,

First I want to thank you guys for your answers.

I went to the EMC CLARiiON Best Practices for Performance and Availability

I figured out that for my IO needs 2 RG of 10+2 were enough.

In my initial design I had to deal with 25 drives. So with 2 RG of 10+2 and 1 HS 25 was perfect.

The thing is that now I must deal with 35 drives.

Anyway, one of my question for RG across DAE was related to the load of the parity on the LCC linking the 2 DAE used by the RG.

To be more precise, if the RG is bound between 0_1 and 0_2, the LCC connecting the 2 DAE will have to support the parity traffic in addition of the traffic to access RG defined in the DAE 0_2.

Could this LCC become a bottleneck?

Otherwise I will with vertical provisioning.

Eric

261 Posts

September 3rd, 2010 07:00

Seeing that the enclosures are daisy chained together, before an LCC has a bottleneck issue, you'll probably see the entire bus have a bottleneck issue. The issue that you may or may not see would be with bandwidth. If your bus is running at 2Gbs/sec, then we are talking near 180-190MBs/sec before you exceed the bus. For a 4Gb, somewhere up near 380 MBs/sec. I am not sure of the exact numbers, just ballparking it.

I personally have been reviewing nars for performance cases for 3+ years and less than a handful of times have I found out the backend bus was the bottleneck. This is before EFD (SSD) drives came around though. With those and the right load you can quickly push a bus to the max if you are not careful.

For your setup, seeing you have just a single backend bus, you don't have the choice of crossing buses with the drives to spread out the load. Will you see and issue? Maybe or maybe not, it really depends.

Hope this helps.

-Ryan

September 8th, 2010 07:00

Hi,

If the CX4-120 is only us for backup purpose.I suggest you to set the cache page size to 16kb.

Doing this will allow you to sent up to 2MB to the back end in one operation.

With a 16Kb cache page, a 4+1 with a 256 Kb stripe size will get 8 stripes written in one operation, 4 if you have a 8+1.

4 RG 7+1 + 1 HS should be a good solution for your configuration.

2.2K Posts

September 8th, 2010 07:00

One thing I would like to add about using a LUN as a backup to disk target. Depending on how you are going to present that LUN as a backup target you may find that over time the file fragmentation on the LUN will severely degrade performance of the backups/restores. I have one windows file server that has multiple large LUNs masked to it from a CX4-960, and the LUNs are used as backup targets through NTFS shares. The shares are used to backup SQL servers and are then used to restore to development SQL servers. I have seen the file fragmentation of the file system reach a point from the daily deletion of old backup files and the writing of new backup files where some backup files are composed of tens of thousands of file fragments.

So our option was to setup a frequent format job for the LUN or find another solution. The shares are in use very frequently so it would have been difficult to schedule downtime of the shares to format the LUNs so we opted for deploying a Celrra Gateway to present backup shares. The file system used to present the NAS shares doesn't have the same fragmentation issues as NTFS shares so it should alleviate a lot of the pain point of using windows shares as a backup target.

September 10th, 2010 07:00

Hi,

Thanks for the info. I changed the cache page size to 16K and a significant improvement has followed.

But for you second suggestion with the element size of 256. I simply cannot bind a LUN with a -elsz 256 .

When I run the bind command the have the following error:

Too few or invalid command line parameters: -elsz
Valid usable-size for drive type: 128

Otherwise I'm actually using 4 RaidGroups of 7+1 drives accros my 3 DAE and my 35 Disks. (3 HS)

Running tests are quite good 300MB/s W so far

Eric

September 10th, 2010 09:00

Hi,

The default stripe size is 64Kb for clariion. For example with a R5 4+1 you get 4x64K=256KB stripe size.

Just bind the LUN with the default option the clarrion will do the job .

September 15th, 2010 07:00

Ok Thank you for the element size info.

I want to thanks all of you. I'm actually running a fine configuration based on your suggestions.

I'm running around 250 MB/s of Writes and 320 MB/s or Reads (sequential with R/W size of 128KB)

I changed the page size of the cache to 16KB with significant improvement.

I did 4 RG of 7+1 with 3 HS.

Thank you again.

Eric

No Events found!

Top