When it comes to optimizing performance you eventually will run into storage bottlenecks.
To name a few:
1. We can answer this one very quickly: if the number of MBps to/from your array falls short, you need more bandwidth. You can accomplish this by spreading the load over the existing storage ports more efficiently or adding more ports to your array. One thing you can be sure of: if you're hammering the storage ports that hard so they'll actually reach the max they can handle, the disks are keeping up! So the disk / LUN layout is not a bottleneck at this point.
2. Each storage port can only handle a certain amount of outstanding I/Os. Mostly this number is 2048 for a vast range of arrays. For Clariion / VNX you need to do the math on 1600. If a port gets more than that, the port will issue a QFULL reply to the HBA that requested the extra I/O. This QFULL will trigger an action on the Operating System of that particular host. In the old days such a reply could make the OS loose access to its disk (LUN), but modern OSs can deal with this and will slow down sending IOs to the array to allow the array to "recover" from this overload. This will however slow down I/Os on the host, so slowing down the application. Ways to avoid this are configuring the maximum queue depth / execution throttle in the HBA / OS or moving hosts to other storage ports. In EMC Midrange arrays you can use Analyzer to see if QFULLs are an issue. Set Analyzer to advanced before starting it and open the 3rd tab and select the storage ports. You will see a metric called QFULL which you can select.
3. The amount of cache is always a bottleneck, but you might never encounter a problem. When you do, check if forced flushing is taking place. If it is, the disks may be too slow to be able to handle the load. A wide variety of causes / solutions exist ranging from changing storage tiers, adding disks or changing the layout of your LUNs. Traditional LUNs are carved out a single RG, but a RAID Group can handle only so much IOps and so also will the LUNs that reside on each RG. When cache performance is an issue, this usually is a trigger to go look for problems on the disk side.
4. As each disk can only handle for example 180 IOps, a LUN which resides on a RG can handle only that amount of IOps times the number of disks at the max. If other LUNs are on the same RG, the RGs maximum performance is shared between all LUNs on that particular RG. Furthermore there's a little caveat called "Little's Law" which states that even though 1 disk can handle a certain amount of IOps, this will come at a cost. Up until about 70% of the IOps the response times are reasonably ok, but if you go above that number the response times will exponentially go up. If you plot the response times in a graph you will see that the response times are exponentially from the beginning, but until around 70% this is acceptable. Above 70% a small increase in IOps will invoke much higher response times. So even though you can actually get 180 IOps from a single disk, consider that actually reaching this value might hurt. A lot of performance graphs in the market today will have a threshold at around 70% saying that if the IOps are above, the performance is reaching critical levels.
5. When you have an array with multiple RGs you will notice that certain RGs will perform very nice and others might suffer from heavy duty hosts which will hammer the LUNs (and so the RGs) so hard that they actually suffer from bad performance. A way to avoid this in the past was to implement metaluns. A meta LUN is actually 2 or more LUNs connected together so the IOps will go to 2 or more RGs, so in the end more disks will handle the I/Os. If the load is spread evenly chances are that no single hot spot(s) will occur anymore, but careful planning and rearranging will take up a lot of time from the storage admins. A metalun can be formed in 2 ways: concatenated and striped. If you need to enlarge a LUN for performance reasons and you need to do this on a single RG, you'd better use concatenated expansion. If you want the better (performance) expansion, you need to expand a LUN using equally large component LUNs on other RGs. Striping makes sure that all disks in the involved RGs are used equally.
The storage pool technology is in fact a set of RGs which work together to handle I/Os for all LUNs that reside in this pool. In this pool each RG has its own RAID protection. A single LUN will be spread across all available RGs as will all other LUNs. This way the pool provides a way of load balancing and all disks are used and almost no hot spots will spoil the performance of 1 or more LUNs.
Thanks every one for participating and sharing ideas on this post and helping the community! I don't want to break current conversation about the logs but I would like to see some discussion on the perofrmance monitoring tools and use cases with most "important parameters" that needs to be considered for doing analysis in any environment. I understand that performance analysis is a ocean and varies from every scenario.
I know few of them:
Symmetrix - SPA
Clarrion - Navisphere analyzer/Unispere analyzer
VNX - Unispere analyzer
Celerra(NAS) - ?
VBLOCK (converged infrastructure) -?
Also what about if we encounter perofrmace with SRDF, is there any tool to analyze the performance issues other then the EMC's internal tool "symmerge" ?
What about prosphere, any thought on that?
I have running the Analyzer on CLARiion and VNX.
Every day the performance data is saved, but I never take the time to investigate it...
Sometimes, when expecting troubles, I am opening the analyzer to check "real-time"data... To analyse performance with nar-files, I don't think the Analyzer is a very helpful tool. I also tried to retrieve data with the command line, but after a few days, I stopped trying. maybe I need some additional tools to make the analysis much less complicated...
The healthcheck service is an outcome for me.
On critical servers, I am continuously running perfmon to gather basic storage data. But then, of course you don't have an insight in the overall performance of your storage system.
I think Analyzer is great. We have it running 24/7 and whenever we have problems I collect a week's worth of NARs (sometimes more) and I merge them into a single large NAR. Analyzer (set to advanced) will show everything I need, but also the AH (Analyzer Helper) is a great tool which helps me locating hot spots in an hour or so (depending on the machine I have this tool running on).
We also use Solarwinds for realtime monitoring and alerting purposes. On the host (Windows) perfmon is my preferred tool.
With Analyzer do the I/Os shown at the disk level include the parity calculation I/Os? For instance 1 write I/O coming from a host will result in 4 I/Os (RAID 5) on the physical disks. Should I expect to see those 4 I/Os when looking at the disk in Analyzer or will I see just 1 I/O?
We also run analyzer 24/7. Works well for when you hear about performance issues days after they occurred. Ideally we'll use all the data to do some system wide trending as well. Currently working on automating that process with a sql database and custom reports.
If you see the I/O at front end you will just see the actual number of I/O's but if you look at the back end disk I/O's the "yes" you will definetly see the I/O's with write penalty.
Exactly: it wouldn't be much of an analyzer product if it didn't show this! You can also see the influence of the cache filling up and the watermark flushing and about everything you always wanted to know.
The Analyser helper is also available to EMC partners.
I think you need to be Velocity implement partner to have access.
I know that we can download it from powerlink, under EMC services partner Web page..