Below is a sample of performance data from Unisphere for VMAX for six of our storage groups. We have more storage groups that have similar results. I find it interesting that the read response time is higher, in some cases much higher, than the write response time. This is counter intuitive. The only explanation I have is all writes go to cache and only some reads can be served from cache; the reads that have to go to disk are driving the read response time up. There seems to be a correlation between much higher read than write response time and lower %hit - for example ESX1_SG and ESX5_SG; however this is not a perfect model because ESX4_SG doesn't follow the pattern.
|ID||Host IOs/sec||Host MBs/sec||Read Response Time (ms)||Write Response Time (ms)||% Hit||% Writes||% Read Miss||Capacity (GB)||Members|
My questions to the community:
Thanks in advance.
Do others see the same results of read vs. write response times?
You are correct in your assumptions, Read Miss will not be served by cache so will take longer to service as the array needs to seek this information form disk.
If you do, is there any correlation between much higher read than write response times and lower than average %hit values?
Read Hits are serviced by cache, the track you are requesting is already there, cache reads are faster. The read response time also depends on things like how busy the overall back end disks and back end directors are.
Do you have any other explanation for read response being higher than write response?
Not really at this stage. You should be however looking at heatmaps etc just to make sure you don't have overly busy components
Is there any tuning I can do to increase the %hit value?
This is workload dependent, sequential workloads will have a higher hit rate due to prefetch algorithms being more effective on these workloads, when sequential access patterns are detected at the front end the array will automatically start loading cache with the next sequential tracks.
Do you have FAST configured on your array? One of the major benefits of FAST (and a small number of EFD disk) is that it will locate read-miss workload onto the EFD disk, minimising the overhead of the read-miss operation.
You can also get a generally higher read-miss rate if you don't have enough cache in the array, so increasing the amount of cache can, in certain circumstances, improve the hit rate.
The other VMAX feature available that can influence the read-hit rate is cache partitioning. If you have a very cache-hostile workload that is flushing cache more that it should, putting that workload into a cache partition can benefit the other users of the array.
We do but only 8 EFD's per engine which is the minimum best practice. It's always full so I'd imagine adding EFD would really help read-miss but performance seems "acceptable" according to our clients so the cost of extra EFD is why we haven't added more. Plus the fact that best practice is to add 8 at a time to engines is a pretty big jump.
Cache is maxed out in all 8 engines so no room for improvement there.
We have over 130 hosts on the VMAX so cache partitioning would be pretty tough and i'd imagine we may end up making things a little worse for some servers by not letting the array do it's job. I think cache partitioning would help if it was done right and our WP rarely gets over 5%. Our SE recommended to keep things simple which I agree with until someone complains.
Writes should always be to cache, and reads may sometimes come from disk, so in general, I would expect read response times to be higher than writes.