SignatureIT

23 Posts

8888

December 8th, 2015 07:00

ScaleIO VMware Poor Performance

We had good success with ScaleIO running on dedicated Centos 7 Bare Metal servers.

Environment is 4 nodes running Centos 7, dual 10Gb attached and local SSDs JBOD storage.

SDC client performance is great and systems are very stable using different workloads.

We have not been successful running SDS within VMware in hyper-converged fashion.

We take the exact same system as above, and re-install in this fashion

Environment is SAME 4 nodes this time running ESX 6.0u1a, dual 10Gb attached and local SSDs JBOD storage.

Hardware is the same, network is the same, we just replace the OS with ESX

Each node with LSI Controller Direct JBOD Mode (Controller cache is bypassed)

2 MDMs running as Centos 7 Guest OS on ESX local storage

1 TB running as Centos 7 Guest OS on ESX local storage

4 SDSs running as Centos 7 Guest OS on ESX local storage (we followed the SIO performance tuning guide)

8 SSDs dedicated to SIO in JBOD mode (2 per ESX server)

4 SDCs ESXi Kernel VIB driver mapping SIO Datastores from SSDs pool

Simple VM guest install on the SIO Datastores performance heavily impacted

Doing a simple linux DD bandwidth test on a guest VM using the SIO shared storage, the performance is not consistent some time results are somewhat ok 500MB/s, some times performance is very bad 20MB/s, yes we are purging the Linux FS cache before each run.

The system performance degrades tremendously and is inconsistent, when you do TOP or IOTOP you can see high wait time on the SDSs (yes we are using NOOP scheduler, all SIO datastores are thick-provisioned lazy zero).

It has been 3 days of trials...we are unable to pin-point where the problem is.

Your assistance is appreciated.

Thank you!

Responses(26)

bogdansmc

12 Posts

0

January 18th, 2016 03:00

fio --name=testfile --readwrite=randread --time_based --runtime=10 --direct=1 --numjobs=3 --iodepth=16 --size=256M --bs=64k

on ESX sdc: 350 IOPS

lat (msec) : 2=23.26%, 4=36.78%, 10=16.80%, 20=10.94%, 50=12.06%

lat (msec) : 2=25.73%, 4=35.97%, 10=16.74%, 20=9.24%, 50=12.32%

lat (msec) : 2=24.72%, 4=37.25%, 10=16.05%, 20=9.18%, 50=12.79%

on linux sdc: 1389 IOPS

lat (msec) : 2=88.74%, 4=7.22%, 10=0.12%, 20=0.60%, 50=0.78%

lat (msec) : 2=87.24%, 4=8.37%, 10=0.29%, 20=0.66%, 50=1.24%

lat (msec) : 2=89.85%, 4=7.14%, 10=0.20%, 20=0.59%, 50=0.65%

My remind here - all these tests are made on the SAME vm (linux centos 7 vm) on the same ESX host!

First disk is ScaleIO using vmware sdc, then VMFS datastore on it, then vmdk disk on datastore.

Second disk is ScaleIO using linux sdc directly in vm.

There can not be network difference on these tests!

PS. We have bandwidth limit in linux sdc test I think (91MB/s on 1G network)

PPS. We have no bandwidth limit in vmware sdc test (22MB/s on 1G network)

alexkh

60 Posts

0

January 18th, 2016 05:00

Can you please post the output of: cat /etc/vmware/esx.conf | grep scini

alexkh

60 Posts

0

January 18th, 2016 05:00

There is a problem with this string, although this shouldn't cause any performance issues, however, you've configured only a single MDM IP for an SDC, while you have to configure the IPs of all MDMs, otherwise in case of switchover between the MDMs, you are going to experience DU.

Sorry for asking more and more questions, however:

1. Did you reboot your ESX hosts after making changes of the scini parameters?

2. Please post the output os scli --query_all_sds, scli --query_all_sdc, scli --query_cluster

3. What is the IP of Linux SDC and ESX SDC?

bogdansmc

12 Posts

0

January 18th, 2016 05:00

cat /etc/vmware/esx.conf |grep scini

/vmkernel/module/scini/options = "netConSchedThrd=4 mapTgtSockets=4 netSockRcvBufSize=4194304 netSockSndBufSize=4194304 IoctlIniGuidStr=f293c34a-7e89-4029-9551-b9f9d5021f6c IoctlMdmIPStr=192.168.12.46"

/vmkdevmgr/logical/logical#logical#vmkernel#com.emc.scini0#0/alias = "vmhba64"

bogdansmc

12 Posts

0

January 18th, 2016 06:00

I know, I have single MDM now for tests (not clustered)

1. Yes, and not only once.

2.

scli --query_all_sds

Query-all-SDS returned 3 SDS nodes.

Protection Domain 6e57fe0700000000 Name: default

SDS ID: ec17512d00000002 Name: SDS_[192.168.12.47] State: Connected, Joined IP: 192.168.12.47 Port: 7072

SDS ID: ec172a1e00000001 Name: SDS_[192.168.12.44] State: Connected, Joined IP: 192.168.12.44 Port: 7072

SDS ID: ec172a1d00000000 Name: SDS_[192.168.12.46] State: Connected, Joined IP: 192.168.12.46 Port: 7072

scli --query_all_sdc

MDM restricted SDC mode: Disabled

Query all SDC returned 7 SDC nodes.

SDC ID: e4dc68b900000000 Name: N/A IP: 192.168.12.45 State: Connected GUID: E2205DA9-D67B-4920-B80F-2D8C4839026F