I am looking into possible performance issues on a meta lun that is on EFD disk on my VMAX. The LUN has a SQL2008 DB on it. I am getting complaints from the SQL team that the disk is "slow". I find it hard to belive being on flash but ok. I have been looking at SPA and ECC Performance manager. During the "slowness" the I/O profile is have random reads. I have been looking at the read response time and read miss response time. I know a read miss goes to disk and not cache. The read miss response time is under 2ms consistently. Great! However the read response time goes anywhere from 2ms to 50ms! Unfortunatly SPA and Performance manager shows in 5min increments so I can get much more granular for how long it spikes.
My main question is... Is read miss response time just how long it takes the VMAX to go to disk for the I/O, or is it calculating the time that the I/O comes in, cant find in cache, goes to disk, then back to the FA? Why would the read response time be so much higher then the read miss? Is 20-50ms something to even be concerned about?
The I/O profile of the LUN is consistent. It has been doing these spikes from the begnning it seems but now that they are looking at it closely i just want to make sure the 20-50ms spike isn't causing issues.
Thanks for your help!
can you use realtime in SPA?
what is percentace for readmiss?
also chech disk seek time.
I belive readmiss response time should be higher at some point.
if you have meta, you also need to check for meta member.
Use symstat..Also I hope you are not overloading the Front End.Check it on the host what is the queue length and whether these are concurrent reads and how'z it handling?
Is the meta concatenated/Striped?How many FA allocated for the host?
SQL server is very bursty in nature, particularly with it's checkpoint writes (AKA lazy writes). You don't typically see them in Performance Manager data as the interval is too large. You may see the evidence in PM by looking at the FA queue stats. Check your FA queue stats and you will likely find excessive queuing which is causing your read times to elongate. You can confirm this on the host by running perfmon and looking and the read/write counters. Run it over a period of time at 1 second intervals and you will really get a good idea of the write bursts. I've seen 5000+ in one second.
The best way to avoid the impact of these writes is to assign more FA's to do the work. Make sure you have Powerpath to ensure the IO's are being distributed across the FA's.