Serge12
1 Nickel

Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

I have a simple question about  Clariion CX3-20F maximum bandwidth performance.

I was trying to estimate what  bandwidth was available to a newly connected host (connected to Clariion  CX3-20F via 2 hba's) and I got around 250-360MB/sec sustained  sequential read (i/o block 64-2048KB). This is based on 14 new  15K disks configured as  2x Raid5 6+1. I expected to get a lot more  (each disk can do up to 160MB/sec sustained sequential read from the  media, so theoretical max is 12 x 160MB/sec).

So I digged into some specs and  I see that the frame has only one back end bus. I see in the  documentation the following (best practices doco):

"then these  five drives can exceed the bandwidth of the back-end bus which  is about 360MB/sec".

Since we have a single back-end bus, I make a  conclusion that Clariion CX3-20F supports maximum bandwidth of ~  360MB/sec, to be shared across all hosts connected to it.  So if one host uses 180MB/sec, then all others will get a maximum of  180MB/sec etc.

Am  I reading this right?

If the above sounds about right, what Clariion  model would be the next closest performance level from CX3-20F (say  something with 2 back-end buses)?

Thanks in advance for any insight.

0 Kudos
1 Solution

Accepted Solutions
DaveZ1
2 Iron

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

RE: "I run benchmarks and not only did I not see any improvement, I saw decrease vs what we had before."

No doubt, I suggest you read about metas - and also metas relating to sequential performance - in the Best Practices Guide. You have implemented a worst practice in striping a meta across multiple LUNs from the same RAIDgroup (vertical striping of the meta). That hurts performance by sending the heads all across the disk in what should be a sequential access.

And as you noted, a meta is addressed by only 1 SP so that's why your current owner for meta 100 is all SPA despite you building the meta of SPA and SPB LUNs.

Furthermore, you are trying to maximize sequential thorughput but you subdivided your LUNs even more (compared to initial design)! Which means lots of seeks for the disks as they go between all the LUN partitions. Not good.

So, here is what you need to do (all of which you will realize once you have read the BPG)

1) Read BPG

2) Rebuild using 2 LUN per RAIDgroup

3) Stripe at host, or if you must meta using 1 meta per SP, and one LUN from each RG in each meta. So, you end up with 2 metas, one for SPA, one for SPB, each meta striped across all RAIDgroups. Follow same approach if striping on host.

then you'll hit both back end buses, use all drives, use both SPs etc. Still hitting the drives with 2 streams (one form each meta), but still should be pretty good perf. If not good enough, try 1 LUN per disk group and assign one LUN to each SP as you suggest in point 3 below.

FWIW some answers:

1. Is my understanding right that all LUNs in a metaLUN must be owned by the same SP? (in which case an effort to assign different SP's within a metaLUN like shown above was a waste of time).

Correct

2. Is my understanding right that since we optimize for sequential i/o and since we can't assign LUNs to different SP's in the same metaLUN there is no benefit to creating metaLUNs striping across raid groups, like they were created above?

Incorrect. Simply make 2 metas: one for SPA, one for SPB striped at most from 1 LUN each from each RG as described above

2a. Let's say even if there was any benefit to metaLUNs and striping across raid groups, what could possibly be the benefit of creating and using 4 LUNs for each metaLUN instead of 2 (one in each RG)?

None in fact, you made things worse as explained above

3. It looks to me that the best we can do to optimize sequential throughput on the host is to delete all 8 LUNs and crease one LUN in each RG, and then assign them to different SP's. I think this is exactly what was suggested.Yes

I do not want to do any striping in Windows. For a moment I thought metaLUN will be a substitute but now I realized I was wrong.

It can be but only under the restriction that a meta can bring only 1 SP into the equation.

There is also a factor here that our application is using mostly block sizes 64-256KB, so having more disks (larger stripe 768KB ) is not going to benefit us. Right? Instead it's best to split the disks and have stripe 384KB and have sequential i/o going at each of raid groups independently. Any feedback on my thoughts?

Larger stripe on host can help if the filesystem bundles up multiple IOs into a single request. Meta stripe of ~ 1 MB is fine, see note in BPG about stripe multiplier.

4. Trespassing. This may be related to our config or maybe not, but it looks like on our Windows 2008 host all SPB LUNs want to trespass to SPA, limiting us to 360MB/sec. The same trespassing occurs on another Windows host that is connected to the same array. Any gotchas here? Any specific Powerpath settings to set to make sure trespass doesn't happen or if it happens it is reversed automatically? I think we have all proper licenses and all recent firmware and so on. There are two HBAs and each shows two paths online. Maybe there is some doco that specifically addresses Windows trespassing problems?

Not sure what is happening here. IF the Primary owner of the LUN (or meta) is listed as SPB in the Navi properties page for that LUN then Powerpath will move that LUN to SPB.

0 Kudos
6 Replies
AlanZ1
4 Germanium

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

Your post has been moved to the Unified Storage community where it will likely find the correct audience.

0 Kudos
Highlighted

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

In general, each backend bus can sustain 360MB/sec per Storage processor.  So on a CX3-20, there is 1 bus per SP and 2 SPs, 360MB/sec x 2 = 720MB/sec maximum throughput.  This backend throughput is not necessarily what you can get to the host though, it depends on several factors like processor performance in the SP and RAID Group/LUN Layouts.

Since each SP has an effective maximum bandwidth of 360MB/sec to the backend disk, the first thing you will need to do to exceed 360MB/sec on a CX3-20 is balance the load across both SPs.  Create two LUNs, one owned by each SP and present them to the host.  Then either configure the application to read/write to both LUNs simultaneously, or stripe a filesystem across both LUNs so that all reads are hitting both LUNs at the same time.  If you stripe across two LUNs, they need to be from different RAID Groups or you will have head thrashing.

Also, consider that the 6+1 Raid Group has a 384KB stripe, so you will want to try and get write IO sizes to 384KB or a multiple of 384KB.  If you stripe at the host level, then the effective stripe size is 768KB.   If the array is doing all large block, set the cache page size to 16KB as well. In general, if it's true sequential, any IO size that is a multiple of 64KB will be okay though.

To directly answer your question, here are the bus counts for CX3 and CX4 based EMC arrays..

CX3-20 / CX4-120 = 1 backend port per SP

CX3-40 / CX4-240 = 2 backend ports per SP

CX3-80 / CX4-480 = 4 backend ports per SP

CX4-960 = 4 or 8 backend ports per SP

Hope that helps.

RRR
5 Osmium

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

First of all: a bus might have a max bandwidth of 360MBps, but that for sustained data throughput and I'm pretty sure that you wil nog achieve that in a random I/O environment. If you have a 180MBps throughput on 1 host, this most certainly is some sort of a backup server: large sequential blocks. Is this what you would use your CX3-20 for ?

But to answer your question: yes: if 1 host uses 180MBps, the other hosts have to fight for the remaining 180. But consider this: all I/Os are processed at the same time, it's not like you can reserve half the bandwidth for 1 host and share the rest for the other hosts.

The CX3-40 has 2 buses, but the current model with 2 buses is the CX4-240. The CX3 is a somewhat older model.

And: more cache is better, so if you really need a lot of performance, get more cache, so a larger CX4. More buses is good as well, but to absorb write peaks cache is the word. More buses can transport the data faster to the disks.

It's a mix of things and IT DEPENDS what you'll be using your Storage Array for.

Where did you read that a single disk can do 160MBps ? If you have enough cache and data to send to it and if the blocks are large enough you might get these kind of figures, but a rule of thumb is 10MBps in a random I/O environment. So a Raid Group in RAID5 (6+1) can do around 70MBps under normal workload. I've seen RGs on a CX3 perform up to 200MBps, but this was during LUN migrations in the CX3 internally. If you need to send 360MB per second to your Storage Array, this data has to come from somewhere ? Is your host capable of sending this much data ?

Serge12
1 Nickel

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

Thanks for the fast answers! Here is a 2nd iteration. I hope it will put an end to my struggles.

Please note: I'm optimizing for ***sequential*** throughput, not random.

Once again, on CX3-20F we have two Raid 5 groups 6+1, 6+1 of fast FC disks (sustained i/o from media up to 160MB/sec stated in Seagate spec), and in each RG our host is allocated about 850GB of contiguous space (2 volumes). All LUNs were owned by SPA. On Windows 2008 host I measured max 360 MB/sec sustained sequential read i/o total (about 180MB/sec for each raid group, i/o block size 1MB, 128KB was pretty close to that though). Subsequently I realized this is due to single bus.

After I provided feedback about low performance, this is what was done by our storage guys:

4 LUNs were created in each Raid group, and 2 hybrid(?) metaLUNs were created, each of 4 LUNs from both raid groups.

Raid group LUN MetaLUN  AssignedSP  CurrSP

1               11    100          A                  A

1               12    100          B                  A

1               13    200          A                  A

1               14    200          B                  A

2               21    100          A                  A

2               22    100          B                  A

2               23    200          A                  A

2               24    200          B                  A

So now host can see metaLUNs 100 and 200 as two volumes 850GB each. Note that all LUNs are all still owned by SPA. I run benchmarks and not only did I not see any improvement, I saw decrease vs what we had before.

Explanation: before we had two sequential i/o running in each of raid groups. Total was bottlenecked by single bus / single SP. Now while we still have single bus / single SP limitation, each of our host volumes accesses all the disks (due to metaLUNs), so when we run sequential i/o on each volume (metaLUN), to disks this is more like semi-random i/o, hence decrease in performance. Am I right?

Questions:

1. Is my understanding right that all LUNs in a metaLUN must be owned by the same SP? (in which case an effort to assign different SP's within a metaLUN like shown above was a waste of time).

2. Is my understanding right that since we optimize for sequential i/o and since we can't assign LUNs to different SP's in the same metaLUN there is no benefit to creating metaLUNs striping across raid groups, like they were created above?

2a. Let's say even if there was any benefit to metaLUNs and striping across raid groups, what could possibly be the benefit of creating and using 4 LUNs for each metaLUN instead of 2 (one in each RG)?

3. It looks to me that the best we can do to optimize sequential throughput on the host is to delete all 8 LUNs and crease one LUN in each RG, and then assign them to different SP's. I think this is exactly what was suggested. I do not want to do any striping in Windows. For a moment I thought metaLUN will be a substitute but now I realized I was wrong. There is also a factor here that our application is using mostly block sizes 64-256KB, so having more disks (larger stripe 768KB ) is not going to benefit us. Right? Instead it's best to split the disks and have stripe 384KB and have sequential i/o going at each of raid groups independently. Any feedback on my thoughts?

4. Trespassing. This may be related to our config or maybe not, but it looks like on our Windows 2008 host all SPB LUNs want to trespass to SPA, limiting us to 360MB/sec. The same trespassing occurs on another Windows host that is connected to the same array. Any gotchas here? Any specific Powerpath settings to set to make sure trespass doesn't happen or if it happens it is reversed automatically? I think we have all proper licenses and all recent firmware and so on. There are two HBAs and each shows two paths online. Maybe there is some doco that specifically addresses Windows trespassing problems?

Thanks a lot for any more insight!

0 Kudos
DaveZ1
2 Iron

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

RE: "I run benchmarks and not only did I not see any improvement, I saw decrease vs what we had before."

No doubt, I suggest you read about metas - and also metas relating to sequential performance - in the Best Practices Guide. You have implemented a worst practice in striping a meta across multiple LUNs from the same RAIDgroup (vertical striping of the meta). That hurts performance by sending the heads all across the disk in what should be a sequential access.

And as you noted, a meta is addressed by only 1 SP so that's why your current owner for meta 100 is all SPA despite you building the meta of SPA and SPB LUNs.

Furthermore, you are trying to maximize sequential thorughput but you subdivided your LUNs even more (compared to initial design)! Which means lots of seeks for the disks as they go between all the LUN partitions. Not good.

So, here is what you need to do (all of which you will realize once you have read the BPG)

1) Read BPG

2) Rebuild using 2 LUN per RAIDgroup

3) Stripe at host, or if you must meta using 1 meta per SP, and one LUN from each RG in each meta. So, you end up with 2 metas, one for SPA, one for SPB, each meta striped across all RAIDgroups. Follow same approach if striping on host.

then you'll hit both back end buses, use all drives, use both SPs etc. Still hitting the drives with 2 streams (one form each meta), but still should be pretty good perf. If not good enough, try 1 LUN per disk group and assign one LUN to each SP as you suggest in point 3 below.

FWIW some answers:

1. Is my understanding right that all LUNs in a metaLUN must be owned by the same SP? (in which case an effort to assign different SP's within a metaLUN like shown above was a waste of time).

Correct

2. Is my understanding right that since we optimize for sequential i/o and since we can't assign LUNs to different SP's in the same metaLUN there is no benefit to creating metaLUNs striping across raid groups, like they were created above?

Incorrect. Simply make 2 metas: one for SPA, one for SPB striped at most from 1 LUN each from each RG as described above

2a. Let's say even if there was any benefit to metaLUNs and striping across raid groups, what could possibly be the benefit of creating and using 4 LUNs for each metaLUN instead of 2 (one in each RG)?

None in fact, you made things worse as explained above

3. It looks to me that the best we can do to optimize sequential throughput on the host is to delete all 8 LUNs and crease one LUN in each RG, and then assign them to different SP's. I think this is exactly what was suggested.Yes

I do not want to do any striping in Windows. For a moment I thought metaLUN will be a substitute but now I realized I was wrong.

It can be but only under the restriction that a meta can bring only 1 SP into the equation.

There is also a factor here that our application is using mostly block sizes 64-256KB, so having more disks (larger stripe 768KB ) is not going to benefit us. Right? Instead it's best to split the disks and have stripe 384KB and have sequential i/o going at each of raid groups independently. Any feedback on my thoughts?

Larger stripe on host can help if the filesystem bundles up multiple IOs into a single request. Meta stripe of ~ 1 MB is fine, see note in BPG about stripe multiplier.

4. Trespassing. This may be related to our config or maybe not, but it looks like on our Windows 2008 host all SPB LUNs want to trespass to SPA, limiting us to 360MB/sec. The same trespassing occurs on another Windows host that is connected to the same array. Any gotchas here? Any specific Powerpath settings to set to make sure trespass doesn't happen or if it happens it is reversed automatically? I think we have all proper licenses and all recent firmware and so on. There are two HBAs and each shows two paths online. Maybe there is some doco that specifically addresses Windows trespassing problems?

Not sure what is happening here. IF the Primary owner of the LUN (or meta) is listed as SPB in the Navi properties page for that LUN then Powerpath will move that LUN to SPB.

0 Kudos
Serge12
1 Nickel

Re: Clariion CX3-20F maximum bandwidth - simple question

Jump to solution

Thank you all for your helpful suggestions. This is a summary and conclusion of this thread, for the benefit of all future readers, and as a reminder to myself.

The original question was how to optimize sequential throughput using 2 Raid groups 6+1 Raid 5 on CX3-20F.

There were 3 suggestions:

A1. Create one LUN per raid group, assign to different SPs, and present to host. Load balance i/o on the Windows host (similtaneously accessing both LUNs).

Theoretical max 360MB/sec per LUN, 720 MB/sec total under ideal circumstances. Benched close to 500 MB/sec.

A2. Same as A1, but stripe two LUNs on the host (no need to load balance).

Theoretical max 720MB/sec total. Benched close to 500 MB/sec.

A3. Create two LUNs per raid group. Vertically stripe across raid groups, by creating two MetaLUNs. Assign each MetaLUN to different SP. Present two MetaLUNs to host.

The scenario that won was A1. It produced the highest total throughput (up to 500MB/sec under certain scenarios of 720 theoretical max). It also worked well in our apps. Scenario A2 sounded like a winner (discounting extra risks of Windows software striping) and produced good iometer benches. But in our apps it was slower than A1.

I never got to testing proper A3 config. All I had to test was config with 4 LUNs per raid group, and 2 MetaLUNs (4 LUNs each, from both raid groups) both assigned to SPA. I believe LUNs were vertically striped and horizontally concatenated, but I'm not sure. Maybe they were striped horizontally & vertically instead. It is interesting that under this scenario, a single MetaLUN (hitting all 14 drives, both raid groups) benched just a bit higher than a single LUN under A1 (hitting 7 drives, one raid group). So 7 drives just about exhausted the frame capabilities in our specific setup. Once both MetaLUNs were accessed simultaneously, the speed was down quite a bit as compared to A1. 

0 Kudos