Start a Conversation

Unsolved

This post is more than 5 years old

4962

July 29th, 2014 07:00

How to troubleshoot data storage performance bottlenecks

How to troubleshoot data storage performance bottlenecks

Share: image001.jpg

Please click here for all contents shared by us.

Introduction

Troubleshooting storage performance problems requires a basic understanding of common storage bottlenecks. Storage performance bottlenecks that can clog ports, controllers and disk drives require a mix of tools and IT expertise to find and solve. However, identifying the most common areas for storage bottlenecks can be the first step in troubleshooting them.


Detailed Information


The most common places for storage bottlenecks


1.     Storage-area network (SAN) fabric/Front-end ports

Potential problems include:

·         Oversaturation/overutilization if there's an inadequate number of ports in the array

·         Too much oversubscription in a virtual server environment

·         Improper load balancing across ports

·         Contention/traffic overload at inter switch links

·         Host bus adapter (HBA) congestion if a single HBA port is overloaded

2.     Storage controllers

Potential problems include:

·         Oversaturating the controller with I/O, limiting the IOPS that can be processed from the cache to the array

·         Throughput overwhelming the processor

·         Overloaded CPU/insufficient processing power

·         Inability to keep up with the performance of solid-state drives (SSDs)

3.     Cache

Potential problems include:

·         Insufficient cache memory

·         Overloading the cache with writes, causing slow performance

·         Thrashing the cache by frequently accessing non-sequential data

4.     Disk drives

Potential problems include:

·         Too many applications hitting disks

·         Insufficient number of drives for throughput or IOPS that application requires

·         Disks too slow to meet performance needs and support a heavy user workload

·         Disk groups are potential bottlenecks in a "classic" storage architecture, where the RAID configuration runs over at most at 16 disks. A "thin-wide" architecture, which typically has more disks per LUN, and as a result, spreads data over a wider spindle set -- is less prone to become a bottleneck because of the added parallelism



The key metrics to monitor


Formerly when array vendors stressed IOPS and throughput, or speeds and feeds, but now the main metric that everyone wants to talk about is response time. It's not how fast you can move the data, but how fast you can respond to the request.

You can expect a response time of 4ms for 15,000 rpm Fibre Channel disks, 5ms to 6 ms for SAS disks, about 10 ms for SATA disks and less than 1ms for SSDs.

If you have all Fibre Channel disks, and your response time is 12 milliseconds, something's wrong. It's the same thing if you're buying SSDs. If you have SSDs and your response time is five milliseconds, something is wrong. It may be a connection. You may have some faulty chips.

In addition to response time, other key metrics to monitor include:

·         Queue depth, or the number of requests held in queue at one time; average disk queue length

·         Average I/O size in kilobytes

·         IOPS (reads and writes; random and sequential; average of overall IOPS)

·         Throughput in megabytes per second

·         Write percentage vs. read percentage

·         Capacity (free, used and reserve)

Data storage performance tips and best practices


  1. Don't allocate storage based simply on free space. Take into account performance needs. Make sure you have enough drives for the throughput or IOPS you need.
  2. Distribute application workload evenly across disks to reduce the chance of hotspots.
  3. Understand your application workload profile, and match your RAID type to the workload. For instance,  RAID 1 over RAID 5 for write-intensive applications. Because when you're doing a write on RAID 5, you have to calculate the parity, and that calculation takes time to do. With RAID 1, the write goes to the two drives much faster.
  4. Match the drive type -- Fibre Channel, SAS, SATA -- to the performance you expect. Use higher performing hard disk drives, such as 15,000 rpm Fibre Channel, for mission-critical business applications.
  5. Consider solid-state drive (SSD) technology for I/O-intensive applications, but not for applications where write performance is important.
  6. Seek tools that do end-to-end monitoring, especially for virtual server environments. In general, you need software that pierces the veil of the firewall, there's a firewall between the virtual side and the physical side. So, you need something that looks at both, end to end.
  7. Weigh the pros/cons of short-stroking to boost performance. Formatting a hard disk drive in such a way that data is written only to the outer sector of the disk's platter can increase performance in high I/O environments because it reduces the time the drive actuator needs to locate the data. The downside of short-stroking is that a substantial portion of the disk drive's capacity is unused.

Author: Jiawen

          

1.3K Posts

July 29th, 2014 07:00

Nice write up, but I'm not sure I understand why you think EFDs are bad for write performance.  In VMAX all writes go to cache, so as long as WP limits are not hit, all drive technologies will have the same write performance.  EFDs can destage faster than FC or SATA disks, spindle for spindle, so they may be less likely to reach WP limits.

Also I think your response times that are expected are a bit optimistic.    I did this chart some years ago, and I think this is closer to what to expect the best response time to be with no queuing.   1-2ms for EFDs at the volume level is reasonable.

ScreenShot1225.jpg


2 Intern

 • 

308 Posts

July 30th, 2014 01:00

Hi Quincy56,

“Nice write up, but I'm not sure I understand why you think EFDs are bad for write performance.  In VMAX all writes go to cache, so as long as WP limits are not hit, all drive technologies will have the same write performance.  ”

This article is posted on Symmetrix forum, but it is for general storage product, not particularly for VMAX.So I relocated this thread to general topic column. SSDs are a great panacea for read performance, as long as it doesn't hit the bottleneck of the controller. But it's not a panacea for write performance. 

“Also I think your response times that are expected are a bit optimistic.”

Thank you very much for your data. I think I should use the following expression to be more accurate:

It is possible to expect a response time of 6 milliseconds (msec) for a SAS disk and 4 msec for the 15,000 rpm Fibre Channel disk, and about 1 msec for SSDs.

No Events found!

Top