We have a number of windows guests (2008R2 - file server) that are using windows compression (for historical reasons) and within the last month have upgraded our environment, ESXi to 5.0 and the VNX to 5.32.000.5.015 - all our vmdk's are on block LUNs we dont use VNX FILE at all. Since the upgrade IO performance of the guests has decreased noticably from an end user perspective, eg: user experience, backup times, basic file copies. We have narrowed it down to only those guests using windows compression.
If I run Analyzer while they are doing a test copy I dont see anything particularly wrong with the stats except that the throughput is slow. The IOPS arent excessive, the service and response times are good, the queue length is not very long.
Setup is a VNX5500 with 3 tier FASTVP LUNs & 1TB of FAST Cache.
And the problem is occuring on multiple LUNs in the same pool, and at two different sites with near identical setups. (vBlocks)
The only thing I see thats noticably different since the upgrade is the % dirty pages of FAST Cache has changed dramatically since we upgraded. About 20% extra on both SP's.
I could just uncompress everything but would like to try and figure out why its happening.
how do perfmon counters PhysicalDisk\Avg. Disk Sec/Read and PhysicalDisk\Avg. Disk Sec/Write look inside of one of those VMs ?
Very low most of the time, production file server was hitting about avg38 R/sec and 3 W/sec
Thoughput while doing a file copy is pretty inconsistent, copying a largish file locally 3GB (eg within the same drive) achieves anything from 7 to 15MB/s throughput, on a non compressed server we got about 120MB/s
Yes network was our first suspect, but that was ruled out a couple of weeks ago. I was under the impression that windows defrag on a FAST VP LUN was not a good idea ?
Ill have to talk to the file adms, at the moment they are going to uncompress an entire volume and see whether that makes any difference on that particular server. Keeping it to one change at a time.
We have given them a new LUN for the decompression, disk space blow out from 300Gb to 500GB, which has been built on vmfs5 just in case that was an issue..
All LUNs are thick btw, eagerzeroed from the ESX side. In the case of the file servers they get one or more LUNs per guest so not shared with other guests.
i thought defrag was not recommended for thin LUNs, can't remember exactly. Anything new in the environment (new antivirus, backup software, new version of VM tools) ?
vmtools is new - because we just upgraded to ESX5 from 4.1
AV and backups havent changed, backups are now experiencing issues - we were unable to complete our EOM backups as they took days to run.
all fingers are pointing to windows compression at this stage but cant work out what in the underlying infrastructure stack has caused compression to suddenly cause performance problems.
I might be wrong but just to make you aware of a code bug in Rel 32 P15 that can cause slow performance for VMWare when using VAAI goto support.emc.com and enter emc313487
As I said it may not be it, but just to make you aware of it
Thanks for the primus article, I dont think thats the problem, I got great response on the storage vMotions we did today ~330GB/hour. Good to have as a reference.