Start a Conversation

Unsolved

This post is more than 5 years old

8888

December 8th, 2015 07:00

ScaleIO VMware Poor Performance

We had good success with ScaleIO running on dedicated Centos 7 Bare Metal servers.

Environment is 4 nodes running Centos 7, dual 10Gb attached and local SSDs JBOD storage.

SDC client performance is great and systems are very stable using different workloads.

We have not been successful running SDS within VMware in hyper-converged fashion.

We take the exact same system as above, and re-install in this fashion

Environment is SAME 4 nodes this time running ESX 6.0u1a, dual 10Gb attached and local SSDs JBOD storage.


Hardware is the same, network is the same, we just replace the OS with ESX

Each node with LSI Controller Direct JBOD Mode (Controller cache is bypassed)

2 MDMs running as Centos 7 Guest OS on ESX local storage

1 TB running as Centos 7 Guest OS on ESX local storage

4 SDSs running as Centos 7 Guest OS on ESX local storage (we followed the SIO performance tuning guide)

8 SSDs dedicated to SIO in JBOD mode (2 per ESX server)

4 SDCs ESXi Kernel VIB driver mapping SIO Datastores from SSDs pool

Simple VM guest install on the SIO Datastores performance heavily impacted

Doing a simple linux DD bandwidth test on a guest VM using the SIO shared storage, the performance is not consistent some time results are somewhat ok 500MB/s, some times performance is very bad 20MB/s, yes we are purging the Linux FS cache before each run.

The system performance degrades tremendously and is inconsistent, when you do TOP or IOTOP you can see high wait time on the SDSs (yes we are using NOOP scheduler, all SIO datastores are thick-provisioned lazy zero).

It has been 3 days of trials...we are unable to pin-point where the problem is.

Your assistance is appreciated.

Thank you!

12 Posts

January 18th, 2016 03:00

fio --name=testfile --readwrite=randread --time_based --runtime=10 --direct=1 --numjobs=3 --iodepth=16 --size=256M --bs=64k


on ESX sdc: 350 IOPS

    lat (msec) : 2=23.26%, 4=36.78%, 10=16.80%, 20=10.94%, 50=12.06%

    lat (msec) : 2=25.73%, 4=35.97%, 10=16.74%, 20=9.24%, 50=12.32%

    lat (msec) : 2=24.72%, 4=37.25%, 10=16.05%, 20=9.18%, 50=12.79%


on linux sdc: 1389 IOPS

    lat (msec) : 2=88.74%, 4=7.22%, 10=0.12%, 20=0.60%, 50=0.78%

    lat (msec) : 2=87.24%, 4=8.37%, 10=0.29%, 20=0.66%, 50=1.24%

    lat (msec) : 2=89.85%, 4=7.14%, 10=0.20%, 20=0.59%, 50=0.65%


My remind here - all these tests are made on the SAME vm (linux centos 7 vm) on the same ESX host!

First disk is ScaleIO using vmware sdc, then VMFS datastore on it, then vmdk disk on datastore.

Second disk is ScaleIO using linux sdc directly in vm.

There can not be network difference on these tests!


PS. We have bandwidth limit in linux sdc test I think (91MB/s on 1G network)

PPS. We have no bandwidth limit in vmware sdc test (22MB/s on 1G network)


60 Posts

January 18th, 2016 05:00

Can you please post the output of: cat /etc/vmware/esx.conf | grep scini

60 Posts

January 18th, 2016 05:00

There is a problem with this string, although this shouldn't cause any performance issues, however, you've configured only a single MDM IP for an SDC, while you have to configure the IPs of all MDMs, otherwise in case of switchover between the MDMs, you are going to experience DU.

Sorry for asking more and more questions, however:

1. Did you reboot your ESX hosts after making changes of the scini parameters?

2. Please post the output os scli --query_all_sds, scli --query_all_sdc, scli --query_cluster

3. What is the IP of Linux SDC and ESX SDC?

12 Posts

January 18th, 2016 05:00

cat /etc/vmware/esx.conf |grep scini

/vmkernel/module/scini/options = "netConSchedThrd=4 mapTgtSockets=4 netSockRcvBufSize=4194304 netSockSndBufSize=4194304 IoctlIniGuidStr=f293c34a-7e89-4029-9551-b9f9d5021f6c IoctlMdmIPStr=192.168.12.46"

/vmkdevmgr/logical/logical#logical#vmkernel#com.emc.scini0#0/alias = "vmhba64"

12 Posts

January 18th, 2016 06:00

I know, I have single MDM now for tests (not clustered)

1. Yes, and not only once.

2.

scli --query_all_sds

Query-all-SDS returned 3 SDS nodes.

Protection Domain 6e57fe0700000000 Name: default

SDS ID: ec17512d00000002 Name: SDS_[192.168.12.47] State: Connected, Joined IP: 192.168.12.47 Port: 7072

SDS ID: ec172a1e00000001 Name: SDS_[192.168.12.44] State: Connected, Joined IP: 192.168.12.44 Port: 7072

SDS ID: ec172a1d00000000 Name: SDS_[192.168.12.46] State: Connected, Joined IP: 192.168.12.46 Port: 7072

scli --query_all_sdc

MDM restricted SDC mode: Disabled

Query all SDC returned 7 SDC nodes.

SDC ID: e4dc68b900000000 Name: N/A IP: 192.168.12.45 State: Connected GUID: E2205DA9-D67B-4920-B80F-2D8C4839026F

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: e4dc68ba00000001 Name: N/A IP: 192.168.12.44 State: Connected GUID: C75BBC0B-2CF5-486E-BDA4-DC7277B00D06

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: e4dc68bb00000002 Name: N/A IP: 192.168.12.46 State: Connected GUID: 0ED712B1-DEA4-4697-9685-0CFCEC362A9F

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: e4dc68bc00000003 Name: N/A IP: 192.168.12.47 State: Connected GUID: DBB86316-08A6-488C-B6BC-9C44A8CAC544

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: e4dc8fc900000004 Name: N/A IP: 192.168.11.88 State: Disconnected GUID: 0F8E18CC-141D-4025-B48A-34846AA24398

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: e4dcb6d900000005 Name: N/A IP: 192.168.12.2 State: Connected GUID: F293C34A-7E89-4029-9551-B9F9D5021F6C

    Read band 0 IOPS 0 Bytes per-second

    Write band 1 IOPS 1.0 KB (1024 Bytes) per-second

SDC ID: e4dcb6db00000006 Name: centos7-test1 IP: 192.168.12.188 State: Connected GUID: D8B18D91-793E-4E68-9740-786B0B8F676F

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

scli --query_cluster

Mode: Single, Cluster State: N/A, Tie-Breaker State: N/A

Primary MDM IP: 192.168.12.46

Secondary MDM IP: 192.168.12.45

Tie-Breaker IP: 192.168.12.44

Management IP: 192.168.12.46, 192.168.12.45

Name: N/A

3. Linux SDC has IP 192.168.12.188

ESX SDC has IP 192.168.12.2

60 Posts

January 19th, 2016 02:00

If I had to guess, I would say there is a network utilization issue somewhere between the SDC on ESX and ScaleIO.

While running large IO size, you were able to utilize the network on Linux SDC (91MB/s, which is more ore less 1Gb network), however same test on ESX shows less than 25%.

I would suggest checking your vswitch configurations, vmkernel port and the physical NIC which is dedicated to the vmkernel port.

Are you using same physical NIC and vSwitch for the Linux VM which is hosting the SDC and the vmkernel port which is being used by ESX SDC?

12 Posts

January 19th, 2016 02:00

Yes, I use 3 linux sds machines with 1Gb networks and one ESX machine with 10Gbit network.

I use one 10G connection for all vmkernel ports (for linux VM and for esx sdc as well).

I don't know how can i reserve or increase shares for ESX SDC as it is not virtual machine. Maybe there is bottleneck here - I don't know (there are free resources of memory and CPU (CPU utilization is less then 20% on this host)).

12 Posts

January 19th, 2016 03:00

I made one more test.

on my ESX host I used dd to utilize ScaleIO disk:

[root@esxi-01:/vmfs/volumes/5696d7c5-9e0924c3-3300-001b2180923c] time dd if=/dev/zero of=./t1 bs=1M count=100

100+0 records in

100+0 records out

real 0m 7.60s

user 0m 0.06s

sys 0m 0.00s

This gives me only 13MB/s

I have slightly better result on 4 HDD zfs NFS nas (6.19s)

and on local SATA HDD (5.29s)

PS. Reminder. I use 3 SSD disks in ScaleIO

Is there possibility to test SDC performance on ESX host (not in vm) to exclude VM overhead?

Are there some settings of SDC except those in fine-tunning-guide?

60 Posts

January 19th, 2016 04:00

I'm not familiar with any way to issue IOs directly from ESX.

One more thing: are you using paravirtual controller to map and RDM to aVM?

12 Posts

January 19th, 2016 06:00

Yes, I use paravirtual controller

1 Message

March 14th, 2016 18:00

Did you ever end up finding a solution here to this problem? We're looking at a very similar setup, and very similar situation. Any pointers or tips moving forward from here?

No Events found!

Top