Unsolved

This post is more than 5 years old

3 Posts

1570

April 22nd, 2008 08:00

LUN performance issues

Hi, I am wondering if a performance issue on a 2TB file-share LUN is the reported 360,753,830 Stripe Crossings. This LUN does not have the alignment offset, which I know is a best practice, but I would like to be sure that performance will be noticeably improved before moving it to a new LUN which would have the alignment offset.
Can anyone give some feedback on determining the performance issues root cause for this LUN? Some additional information: Read Throughput is 170 [IO/s] and Write throughput is 109 [IO/s], utilization is 94.93%.

6 Operator

 • 

4.5K Posts

April 22nd, 2008 10:00

Will need a bit more data:

1. Raid type
2. number is disks in the Raid Group
3. number of LUNs in the Raid Group
4. Total IO throughput for all LUNs in the RG
5. Queue Length for each of the LUNs in the RG

Stripe Crossings is a measurement of the times that your data crosses a stripe when you write. If you have a 4+1 R5, the stripe is 64KB (each disk) * 4 (data disks) = 256KB. If you write 1MB of data, you'll get 4 stripe crossings. So it is not always a good indication of a perforamnce issue.

The off-set alignment affects disk crossing, which will affect performance. The Best Practices guides have information on the type of IO that will be more affected by this than other types.

Some on this forum can probably provide you with specific examples of what they have seen.

Please see the below for additional information about Best Practices.


EMC CLARiiON Best Practices for Fibre Channel Storage: FLARE Release 26 Firmware Update - Best Practices Planning

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/H2358_clariion_best_prac_fibre_chnl_wp_ldv.pdf

EMC CLARiiON Fibre Channel Storage Fundamentals - Technology Concepts and Business Considerations

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/H1049_emc_clariion_fibre_channel_storage_fundamentals_ldv.pdf


regards,
glen kelley

410 Posts

April 22nd, 2008 21:00

has the lun cache and sp cache been enabled?

3 Posts

April 23rd, 2008 07:00

Here's the information requested:
1. Raid type is RAID 5
2. number is disks in the Raid Group is 9
3. number of LUNs in the Raid Group is 2, but the 2nd one has low host i/o.
4. Total IO throughput for all LUNs in the RG is similar to the single LUN
5. Queue Length for each of the LUNs in the RG: Where do I find this?

The sp cache has been enabled, where do I check if the LUN cache is enabled?
Thanks

6 Operator

 • 

4.5K Posts

April 23rd, 2008 15:00

To see the cache setting for the LUNs, right click on the LUN and select Properties and then the cache tab.

Queue length is normally viewed when you view an Analyzer archive file.

In a Raid 5 using 9 disks, this is an 8+1 - if each disk is a 10K FC disk then the IO/s for the raid group would be 120 IOPS (the limit on each disk for best performance) * 8 (the number of non-parity disks) = 960 IOPS.

If your IOPS are exceeding this, then you will see higher queue lengths and higher response times.

glen

18 Posts

July 15th, 2010 07:00

Good Glen!

Henry

261 Posts

July 15th, 2010 09:00

The utilization statistic is a bit  misleading, so be careful on saying there is an issue because of it. The  utilization by definition "Describes the fraction of a certain  observation period that the system component  is busy serving incoming  requests." (from Navisphere help). Basically if the lun had at least 1 IO to complete during every poll  that was done, 100% busy.

By chance do you have the license for Navisphere  Analyzer? If so, then  you can review many other thigns that might help  you figure out what is  happening.

First off you can check  the queuing on the lun and  drives under the lun. If  there is queuing on the drives, then as Glen was mentioning, you may be  exceeding the physical capabilities of the drives. If there is queuing,  then consider reviewing the response time and service time values.  Response time is the average total time an IO saw for the chosen device, and  the service time is the average time the operation took to complete. the  difference is that response time factors in time spent in a queue and  service time doesn't. If the values have a large difference, then the  queuing is having a large impact.

The 120 mark  for the 10k RPM drives along with bandwidth specs for each drive speed  can be seen in the EMC Best Practices Guide. The 120 IOP for 10k rpm  drives was a benchmark test under very specific parameters, so the  actual point where the drive bottlenecks depends on the type of IO and  the locailty of the IO and size of the IO. Also in there is a section on  "Storage System Sizing and Performance Planning" which has a lot of  good info. Other performance info can be found in Navisphere help under  the section "Analyzing storage-system performance using Analyzer".

No Events found!

Top