Clariion Raid 6 read performance under a simultenous write

Question

Hi!

Our customer connected one AIX server to the CX3.

He uses TSM with this server. (TSM version is: 6.1.3.2)

He observed the following perfomance degradation:

When he is writing to the LUN, the write performance is 50 Mb/s,

but when he reads from this LUN to the SAME time he get 3-5 Mb/s

read performance. It is a HSM system and it is not acceptable for the

customer. When he reads OR writes to this LUN the performance is acceptable.

He wants to solves this performance problem, he needs to write and read

the LUN in same time. It is a HSM system, and he want to priorize read operation versus

write operation.

We did some performance tests:

Test1: (AIX host lun read/write test with R6 lun)
Copied data to this LUN (LUN 10) via NFS and reads the data simoulteniously with the
TSM HSM client, called dsmmigrate.
The write performance is 50/60 Mb/s however the read performance is quite
poor in this case. (5-10 Mb/s)
Customer wants to improve the read test, it is quite poor performance!!!

Test2: (AIX host lun read with R6 lun)
If we dont writing to this LUN (LUN 10) the read performance is 100 Mb/s.

Test3: (Windows host read/write test with R6 lun)
Windows performs the same result, than in the test1 the AIX host.
Write: 50-60 Mb/s
Read : 5-10 Mb/s and it isnt acceptable

We get the same result too on the windows side, when we tested with
os tools.(write: copy, read: copy)

Test4: (Windows host read/write test with R5 lun)
Customer writes to R5 LUN from the local disk, and reads the data with the
TSM HSM client.The performance is:
write perf: 50 Mb/s
read perf: 50 Mb/s
Read performance is in acceptable level in this case.

During the tests we use the raid groups which are
not used by other hosts or applications. These raid groups are used only for this
purpose.

Customer MUST use R6 luns, and the normal workload is read/write to the
LUN-s, but in this case the read performance is not acceptable level.
The main workload for the LUN 10 is the read, and they want to improve read performance
or prioritize read against lun write to reach better read performance.

Lun 10 the only LUN for this Raid group and its raid group consists of 14 500 Gb FC disk

Any tips and ideas appreciated. We want to modify cache settings, LUN 10 prefetch settings, Flare upgrade to latest 26 version,

but nothing helped.

kelleg · Answer

Is the AIX host connected to the CX3 using fibre channel or iSCSI? How many connections - paths

What is the configuration of the raid group - what type of disks - FC or SATA - what is the speed of the disks?

Have you looked at the Navisphere Analyzer archives for this LUN?

What version of Windows are you using? Is the Windows host connected using iSCSI or FC?

Is this a NAS configuration?

glen

paulo3 · Answer

Hi Glen!

>Is the AIX host connected to the CX3 using fibre channel or iSCSI? How many connections - paths

2 fc cards, four paths.

>What is the configuration of the raid group - what type of disks - FC or SATA - what is the speed of the disks?

14 fc disks. Speed id 10k.

>Have you looked at the Navisphere Analyzer archives for this LUN?

Yes. If we write and read to this lun the LUN, we can utilize this LUN to 100%.

>What version of Windows are you using? Is the Windows host connected using iSCSI or FC?

2003. Windows is connected to array via fc.

>Is this a NAS configuration?

No

We tested again, and both host produces the same result.

only read: 8-100 Mb/s

only write: 50-60 Mb/s

write and read: 40 Mb/s write 5 Mb/s read

Read should be higher, because the disk readning migrates the data from the disk to the tape.

BR: Paul

paulo3 · Answer

>When you do the read test, are you reading from the LUN and writing to the tape? If so, have you tried just a simple read?

We tried to write to disk too. We got the same results. The tape can write approx 160 Mb/s so it isnt a bottleneck.

>If you look in the Analyzer archive at the disks for the LUN, what is the IOPS on the disks during the read test? What is the IOPS at the LUN during the read >test? What is the Read Cache Hit/s during the read test, what is the Read Cache Hit Ratio during the read test?

Sorry i dont know the Analyzer figures.

kelleg · Answer

When you do the read test, are you reading from the LUN and writing to the tape? If so, have you tried just a simple read?

If you look in the Analyzer archive at the disks for the LUN, what is the IOPS on the disks during the read test? What is the IOPS at the LUN during the read test? What is the Read Cache Hit/s during the read test, what is the Read Cache Hit Ratio during the read test?

glen

naughty-natty · Answer

Hi, Can you gather SPCollects and nar files taken when you faced this performance problem and forward it nannumal.thekkelal@gmail.com?

kelleg · Answer

At this point I would recommend opening a case with EMC for performance issues. glen

paulo3 · Answer

I opened it already.

kelleg · Answer

What's the case number? glen

paulo3 · Answer

34306932

kelleg · Answer

In the Documents section, see the following document - see page 40 for a desription of the different types of disks, speeds and performance.

EMC CLARiiON Best Practices for Fibre Channel Storage - CLARiiON Release 24.pdf

Glen

SKT2 · Answer

Have you considered 'powerpath write throttling' to prioritise the RD/WR?

driskollt1 · Answer

That's odd. It should be writing to cache and reading from disk.

When you're writing data, are you dumping a bunch of data to the LUN at once? Are you causing your LUNs to forced-flush (i.e. write cache at 99% full)?

Is there a reason why you're using RAID6?

I'm not too familiar with TSM or AIX...

Typically for a UNIX System I would..

Create 2 6+1 RAID5 Raid Groups.

Create 1 LUN in each RAID Group. - Each owned by a different SP

Use a volume manager (VxVM, ASM, ZFS) to stripe the volumes on the host side.

You'd get the same capacity and less write overhead. Also the benefit of 2 SPs using write cache.

SKT2 · Answer

'Create 1 LUN each RG, each owned by differnet SP, then stripe at host level.'  What if i make a metalun(striped,avoid host sid striping) instead, and should i keep both LUNs with same SP or different SP to get better performance?

driskollt1 · Answer

UNIX volume managers are like a million times better than the host-side striping you get with Windows.

Plus a lot of UNIX file systems don't handle things like LUN migrations to larger LUNs very well. Typically I give space to UNIX servers in chunks (i.e. 532 GB R10 LUNs that UNIX admins stripe together. If they need 3 TB I give 6 532 GB LUNs)

I've had good luck with host side stiping using UNIX volume managers (Oracle ASM, VxVM, ZFS).

It's easier to manage for me and the UNIX admins as well. We migrated all of our data from some CX3-80s some CX4-960s with 0 downtime.

Things I can do with host-side striping that you can't by using metaLUNs

Stripe across both SPs - allows you to stripe and use both SPs (more write cache to use - also the potential to fill up both SPs write cache so be sure to allocate appropriate spindle count)

Drop LUNs to reclaim space

Migrate to new storage without downtime - (I haven't looked at powerpath migration enabler, it might be able to do this now)

Some volume managers still don't handle larger than 2TB volumes - Host side striping is a necessity on some systems when you need large volumes.

Our UNIX Servers are pretty beefy though, so host the load that the volume manager puts on the servers is pretty minimal.

I would never do host-side striping on a Windows box either. I would use metaLUNs for that.

I would alternate the SPs owning the LUNs. i.e vol1 - SPA, vol2 - SPB, vol3 - SPA, vol4 - SPB

kelleg · Answer

Pal, I noticed that this case was closed without a resolution - do you still want this to be investigated? Did you find the problem? Is it fixed? glen

CLARiiON

Clariion Raid 6 read performance under a simultenous write

Was this post helpful?