ESX5.5 storage vmotion speed

Hello experts,

I have an ESX 5.5U2 server (fully patched) connected to a 10G NFS datastore as well as a 1TB ScaleIO volume. The ScaleIO volume is presented by 3 bare metal CentOS servers - all over 10G.

After running some performance tests, I noticed a storage vmotion operation gets 225MB/sec reads FROM the ScaleIO volume TO the NFS datastore. However, when I migrate the same VM from the NFS datastore back to the same ScaleIO volume, I only get 75MB/sec write speeds. I get the same speeds regardless of which virtual format I choose (thin, thick lazy, thick eager). Once the VM has been migrated, I can get +750MB/sec reads/writes inside the VM using iometer. Thus, I am certain the network and storage can handle the speeds.

What options can I look at to figure out why the vmotion speeds are so slow? It seems like something is capping both the read and write speeds to the ScaleIO volume during a storage migrate.

Thanks for any pointers.

Responses(3)

alexkh

60 Posts

0

June 18th, 2015 22:00

Please run the following tests with single writer and 1MB IO size:

1. Write test on VM located on NFS storage

2. Read test on VM located on NFS storage

4. Same tests on ScaleIO.

additional tip: ScaleIO was designed to handle high workloads, which are generated by multiple threads and VMs, while VMotion is using one thread.

You can run several VMotion at the same time, and increase the workload on ScaleIO.

keller51

9 Posts

0

June 19th, 2015 08:00

To answer your questions:

1. Write test on Linux VM located on NFS storage: 475MB/sec (via: dd if=/dev/zero of=/dev/sda bs=1M) on the first pass and 675MB/sec on subsequent passes (see notes below).

2. Read test on Linux VM located on NFS storage: 995MB/sec (via: dd if=/dev/sda of=/dev/null bs=1M)

3. Write test on Linux VM located on ScaleIO storage: 40MB/sec on the first pass and 585MB/sec on subsequent passes

4. Read test on Linux VM located on ScaleIO storage: 192MB/sec with 512B read-ahead.

* Note: The difference between the initial write and subsequent writes using the ScaleIO volume is due to thin provisioning of the VM. The initial pass must create the large VMDK while subsequent passes don't have this overhead. If I create the VM using the "Thick Provision Eager Zeroed" option, all write tests have the same performance.

During my testing, I also noticed a VM created on the VMFS3 volume using 8MB block size is 2.5x times faster at cloning and migrating the same VM on a VMFS5 volume. When I clone a VM on VMFS5, the ScaleIO dashboard tool shows 90MB/secs max during the clone/migrate option. Cloning the same VM to the VMFS3 volume hits 225MB/sec...

What strikes me as odd is the fact this is the VMFS5 clone performance number (90MB/sec) is the exact same number I see when running the DD test on a thinly provisioned volume. This tells me either VMWare or ScaleIO thinks the source VM is thin provisioned and must incur the double-write penalty like it does on a thinly provisioned volume.

Somewhere, some sort of throttle is getting applied to the cloning or storage migration process. I just have not found it yet...

SanjeevMalhotra

138 Posts

0

June 19th, 2015 13:00

Can you run VMotion of multiple VMs on both NFS and then on ScaleIO datastores at the same time and post your test results?

View All

No Events found!

PowerFlex

ESX5.5 storage vmotion speed