We are having major issues with our VMware VDI setup currently, and just about every website points us at our SAN as the probable cause. Since we only have one NS-480/Celerra, there is no way to pull it out of production and test it. What I have tried to do is perform some basic file copy operations and time them from one of our ESXi hosts.
I am NOT a SAN guy, so please be a little patient with me. If these tests are flawed, I am happy to try something else, so long as it will not bring down the system.
Our ESXi hosts are running v5.0 on Dell R900 (4 Xeon quad core CPU's, 128GB RAM) and use Broadcom gigabit NICs and QLogic 2 port 4GB FC controllers.
My test host is under negligible load (idle VM's). The host is connected to the NS-480 using both 4GB FC controller ports and using the EMC PowerPath driver. The host also has an NFS datastore presented to it over GigE. I believe the construction of the datastores is such:
Replica: 135GB of SSD storage
L/C (Linked Clones): 2TB Auto tiered from SSD, 15K, 10K, and 7.2K drives
CIFS: 16TB RAID5 8+1 of 7.2K drives
NFS: I believe it shares the CIFS storage, only 500GB is presented
ESX Local: 60GB RAID1 SAS drives on the PCIe bus.
View_15k: 2TB of RAID10 using 30 146GB 10K drives (built for Exchange/Oracle)
Jump: Another local SAS drive on a remote server.
What has me concerned is that at home, I have a QNAP 459 Pro+ NAS with four 2TB 5900 RPM green drives. I can reliably hit 77MB/s and higher copying to and from my QNAP NFS mount. Seems pretty good for SOHO equipment. These speeds blow away what I can get from the EMC SAN on both FC and NFS mounts!!! I can SCP a file from the SSD LUN about as fast as I can copy it over 4GB FC.
The test I am performing involves copying a 3.8GB DVD ISO image back and forth from these various datastores. I am doing this during off-peak hours to prevent the data from being tainted by production use. The NFS connection is on the same subnet as the ESX host management interface and the ESX host and Celerra are on the same Cisco 3750G switch stack. This should be the fastest, lowest latency link possible short of a cross-over cable.
See the attached document for results. I am continuing to run tests at various hours. I don't think the solution is to replace the EMC with a QNAP!
So performance across FC, CIFS, and NFS is pretty dismal compared to my crappy home lab. I understand IOPS are also important, but if there is little demand on the bus, these file copies should smoke, shouldnt they?
Like I said, I am not the SAN guy, but I can probably access the tool. Not sure what or where I would be looking, but anything to get to the bottom of this. Are you suspecting there is a problem and that I should be getting much better numbers?
I ran some additional tests and updated the chart in the original post.
Using a combination of tar and SSH I have seen a peak of 58MB/s on the work LAN from a server VM to a physical server. I have found nothing to explain why my 4GB FC connections cannot come close to this.
The SAN team tells me we have plenty of capacity, we arent even scratching the surface. All the logs they send to EMC come back and say everything is groovey. They havent seen my test data yet. We have put our VM's and ESX hosts under the microscope as well and we have tweaked as much as we can. Even when our ESX is out of the picture, the SAN performance on CIFS and NFS stinks according to my tests. Why is this so invisible to them?
Message was edited by: MStrong
if you do not have analyzer, then your raw data points will be encrypted and EMC will be able to decrypt any analyze. However I suspect I am not getting a clear understanding of your setup. Are the vmfs databases presented from the SAN or through NFS ? The reason I am asking is analyzer is only good for SAN data and if its NFS then there are other issues we need to examine ( clean network springs to mind ).