Raid 5 Performance Question

Question

I have a CX3-80, Flare 24.17, all 300GB 15k drives

I have a host connected to the following:

LUN1 - 3+1 R5 (dedicated RG)

LUN2 - 5+1 R5 (dedicated RG)

We perform large seq writes (backups), LUN1 out performs LUN2 (same test over and over). With the additional drives, I assumed I would get more total IOPS which should equal better performance.

I reviewed every component up and down the path and I am ending up at the RG/LUN.

Is the extra drives degrading performance due to additional parity writes/calc?

Cheers.

kelleg · Answer

Couple of questions:1. are you using the same test with each LUN - same IO size?2. Are you using Analyzer to capture the data for the test? Is the Archive Interval set to 60 seconds to capture the best data?3. Have you looked at the disk IOPS for each LUN?4. Are either LUN using the vault drives5. How long does the test run?glen

kelleg · Answer

Take a look at the Disk IOPS - that's the real key - if the IO is not exceeding about 180 IOPS, then the response times and queue length at the LUN level should be OK.

Try disabling/enabling the Write cache on the new LUN. Also, what is the size of the new LUN and when did you bind it - mayabe it's still background zeroing.

glen

alokjain1 · Answer

What are the numbers at the host level? It may be a good idea to capture the stats at the host level using iostat (unix) or perfmon (windows)
- write IOPS per LUN? hopefully no reads are gong on for backups on these LUNs
- service time/response time?
- write bock size compare between the two hosts
- Are they using different SP's for active IO?

ironcheflouie · Answer

1. Yes, same test2. Yes, using NaviAnalyzer, it may be at 2 min if I'm not mistaken3. Not per disk. But at the lun level, the old lun goes up to ~ 500 - 600 Total IOPS AND the new lun will max out around ~400+.4. Nope, no vault here. Both RG are are split between DAE's.5. 1hr +

RyanP2 · Answer

If your reviewing the analyzer files, take a look at the number of full stripe writes. If the number is high on the 3+1 and low on the 5+1, it means that the data is not as sequential as you may think.

Full stripe writes allows for large write IOs on the backend in which minimal amount of parity updates are needed. If in the 5+1 the data is sequential to a point, but not sequential enough to fill the entire stripe (large stripe here than the 3+1), then there may be many parity calculations stealing performance from you.

Also, check the drive IOPs like Glen said earlier.

-Ryan

kelleg · Answer

When you create a new LUN and the disks have been previously used for other LUNs, the new LUN needs to be "zeroed" (filled with zeros to clear all data). This takes place in the background - it is part of the LUN initialization. See in the Best Practice guide around page 30 a section called "Fastbind".

glen

SKT2 · Answer

what is 'background zeroing'?

ironcheflouie · Answer

I don't think I ever tried, can you graph IOPS per disk??I can try the disable/enable write cache on the new lun.The new lun size is 1.3 TB, the old one was 700 GB.

ironcheflouie · Answer

I don't have those metrics, someone may have. It's actually a vm so I know cpu and mem stats will be off.

No read, all write. The old luns was pushing ~ 600 IOPS and the new lun ~450 IOPS, same test.

If I remembered correctly, service and response time was fine.

Same SP.

I used IOMeter once before, way back when, maybe a good test.

ironcheflouie · Answer

What tool is avail to read nar files?OK, you got me interested in the number of stripe write. When I look at my new lun (assuming I am looking at the correct spot), I see 2543067 stripe crossings (this is by looking at the prop of the lun and selecting statistics). The old lun also has a high number. Does this relate to disk alignment? This is another topic but VMWare states you do not need to disk align since the vi client will do that for you when you create the datastore (I bet most ppl are jumping in their seats now).FYI, all luns were presented new.The guys are basically dumping SQL databases to this lun, so it must be seq.Thanks for all the responses thus far from everyone.Cheers

RyanP2 · Answer

First off, you can open NAR files in Navisphere. Go to tools, analyzer, archive, open. One thing I would do is take a look at Navisphere help. Under help there is a section called "Analyzing storage-system performance using analyzer". This has all the performance information on what you are looking at and how to use analyzer. This is the Navisphere admin guide as they don't produce this document anymore, they just put it into Navisphere help.

As for the stripe crossings, the number you see in properties screen is historical, so it really doesn't paint the whole picture. Open a nar and check out the number of disk crossings percentage on a lun to see the actual data. Disk crossing % is an "advanced" option so you need to advanced stats enabled. Click tools, analyzer, data logging and check the advanced box. In the data logging screen I would also uncheck the box that says "initially check all tree objects".

Per the Navisphere help: Disk Crossing (%) is the percentage of requests that require I/O to at least two disks compared to the total number of server requests.

-Ryan

CLARiiON

Raid 5 Performance Question

Was this post helpful?