hkraal

41 Posts

34617

September 8th, 2014 03:00

6100XV vs 6210X: Diffrence in write speed/latency

Hi,

In november 2011 we purchased an 6100XV and recently an 6210X has been racked to be placed within the same group as the old unit. For testing purposes both units are running in their own group for now untill I'm happy with what I see.

The setup:
As storage we've a PS6100XV (4x 1Gbit + MGT) and an 6210X (2x 10Gbit (copper or fiber) + MGT) both running FW 7.07 and connected to our iSCSI switched.

Our storage network consist of two M6348's placed in our M1000E chassis. The switches are stacked and running version 5.1.3.7.

The processing power for our vSphere cluster is coming from our M1000E chassis with 4 blades (M710/M820). Each blade is running vSphere 5.5 (1892794) with MEM 1.2 installed and is connected with 4x 1 Gbit to the internal M6348 ports using 4 paths.

The 'problem':
When dd-ing 2 GB of data from /dev/zero to disk within the VM I see diffrent results. Please note that I'm using the same VMware host, network etc only diffrent equallogic's:
   6100XV   : 101 MB/s
   6210X   : 239 MB/s

Looking in esxtop I see the following on my iSCSI ports during the 2GB write:
6100XV:
   Latency raises to 7 ms (seen by SanHQ as well)
   We see and equally spread load across all 4 links of ~256 Mbit

Esxtop - network
   PORT-ID              USED-BY TEAM-PNIC DNAME              PKTTX/s MbTX/s    PKTRX/s MbRX/s %DRPTX %DRPRX
50331658                 vmk2     vmnic4 vSwitch1           2182.96 256.54    3934.29    4.20   0.00   0.00
50331659                 vmk3     vmnic5 vSwitch1           2249.15 265.84    4087.83    4.37   0.00   0.00
50331660                 vmk4     vmnic8 vSwitch1           2182.39 269.51    4053.12    4.44   0.00   0.00
50331661                 vmk5     vmnic9 vSwitch1           2155.30 267.10    4032.52    4.43   0.00   0.00

Esxtop - virtual disk
     GID VMNAME           VDEVNAME NVDISK   CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s LAT/rd LAT/wr
    7115 mail38                  -      2   576.02    69.81   506.21     1.06   118.68 22.27   7.01

6210X:
   Latency raises to 74 ms (seen by SanHQ as well)
   We see and pretty close maxed out load across both links of 900+ Mbit

Esxtop - network
   PORT-ID              USED-BY TEAM-PNIC DNAME              PKTTX/s MbTX/s    PKTRX/s MbRX/s %DRPTX %DRPRX
50331658                 vmk2     vmnic4 vSwitch1          10995.86 918.20   16286.85   14.44   0.00   0.00
50331659                 vmk3     vmnic5 vSwitch1          11036.87 945.63   16856.19   11.83   0.00   0.00
50331660                 vmk4     vmnic8 vSwitch1            247.96    8.55     251.77    3.59   0.00   0.00
50331661                 vmk5     vmnic9 vSwitch1            258.45    8.35     251.77    4.51   0.00   0.00

Esxtop - virtual disk
     GID VMNAME           VDEVNAME NVDISK   CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s LAT/rd LAT/wr
3477742 eql02bench01            -      1   436.24     2.15   434.09     0.01   216.48 11.68 74.16

As the 6210X only has two active ports it only utilises two paths but as you can see it can combined both links to a 2 Gbit connection where the 6100XV can not combined all four links to a 4 Gbit connection. Besides the bandwidth 'issue' I'm seeing a huge diffrene in latency during the writes between both units.

Questions:
1) What the diffrence between both units that causes the one to be limited to 1 Gbit and the other not
2) Why is the write latency so high? Both controllers are in the green so write back should be active.

Regards,

- Henk

Responses(10)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 10th, 2014 09:00

Looking at the average blocksize the latency is appropriate. Large blocksizes mean more latency but greater speed.

I'm guessing you are running Linux?

Try 'dt'

www.scsifaq.org/.../dt.html

Once you compile it. Try something like this:

./dt of= bs=8k pattern=iot disable=compare capacity=

e.g.

./dt of=/mnt/disk/test.dt bs=8k pattern=iot disable=compare capacity=20G

Then try with bs=64k.

This will do a sequential write then read. The disable=compare will prevent checking the read data for errors, which slows down the test.

Then try to a raw partition.

./dt of=/dev/sdx bs=8k pattern=iot disable=compare capacity=

And again with 64K.

When you run to a filesystem it tends to break up large writes.

If you want to run multiple passes add passes=X at the end.

Regards,

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 11th, 2014 09:00

Well now that you have a Windows system to test with, try testing with the 6100. You should be able to achieve greater than 100MB there as well. Especially with filesystem overhead removed from the equation.

There's no technical reason that the 6100 won't doing greater than GbE speed. It won't do 4x GbE but that's down to what the disks can actually provide.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 8th, 2014 10:00

You might also want to take a look at this thread. It has a link for best practices for ESXi + EQL

en.community.dell.com/.../19598993

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 8th, 2014 10:00

Hello,

You didn't mention what RAID level you are running, but there is a world of difference between those two controllers. Faster, more cache, faster cache, HW assist for parity calculation are just a few of the improvements. You also didn't indicate what block size you are using for the 'dd' test. But large blocksizes yield greater MB/sec, but increase latency.

Regards,

hkraal

41 Posts

0

September 9th, 2014 03:00

Both units are running raid-10, both VM's are using 4K blocksize on their FS, the EQL's are using the default blocksizes (afaik 4K).

I'm aware of the major diffrences between the units I don't expect them to be equal in performance. I do however expect that the new unit has the better hardware and shouldn't have a 70 ms latency when writing with ~238 MB/ps to it. I'm under the assumption that such latency spikes could be bad for overall performance when we take the 6210 into production.

As far I know I'm following all current best practices:

vmware 5.5

jumbo frames

removed the storage heartbeat VMK's during last VMware updates

using Dell Multipath Extension Module (MEM 1.2)

Delayed ACK is disabled

Large receive offload is disabled

Storage I/O control isn't used

Meanwhile I've opened an case (900308914) referring to this topic as well. I'll keep the thread updated as the case moves forward.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 9th, 2014 09:00

With block storage there isn't a "blocksize" per se. RAIDsets use a stripe size, which on EQL is 64K. Blocks on EQL storage, like many vendors are arranged into larger groupings called pages in EQL parlance. EQL uses 15MB pagesize.

Instead of dd, have you tried IOmeter, or SQLIO?

Since you are testing inside the guest, if you haven't already done so, create a 2nd VMDK, on its own Virtual SCSI adapter, do not format it.

In IOmeter set the outstanding IOs = 64, workers =1. Do some 100% read and write tests with 8K, and 64K blocksizes.

Re: Latency. When you ran 'dd', what did you give it for blocksize there?

Also, not that you were doing over 200MB/sec which isn't a normal iSCSI pattern especially with ESXi. If you have SANHQ and look at the IO history of your other array I believe you will find you've never approached that level of throughput.

Note: When you format NTFS partitions, regardless of physical, or virtualized, set the NTFS cluster size to 64K. This helps optimize I/O on the RAIDsets 64K stripe size.

hkraal

41 Posts

0

September 10th, 2014 09:00

Hi Don,

I don't have a Windows machine which runs on the 6210 so IOmeter isn't an option. I've played a bit with bonnie++ which produced the high write latency in the first placed. As I could reproduce the latency spike with the 'dd' command I haven't used it anymore.

The 'dd' command has been run with a blocksize of 1M and 4k, both yield the same results (200+ MB/ps and 60+ ms write latency).

The 200+ MB/ps is actually achieved and visible in SanHQ as well:

https://www.dropbox.com/s/u3hsytgxti18vil/Selection_081.png?dl=0

hkraal

41 Posts

0

September 11th, 2014 04:00

Hi Don,

We're mostly a Linux shop (Ubuntu) but yesterday I quickly installed a Windows machine, when we've a virtual env why not make use of it right?

After some tinkering I managed to get a nice seqential write IO using 4, 32, 64 and 256 KB blocks and (offcourse) you're right; the high latency was gone:

4KB yielded 150 MB/ps troughput with < 1.0ms latency

32KB yielded 215 MB/ps troughput with < 1.0ms latency

64KB yielded 215 MB/ps troughput with < 1.0ms latency

256KB yielded 219 MB/ps troughput with 65 ms latency

I know that bigger chunks of data results in higher latency but I didn't realise that so much as I didn't experience the same 'issue' on both EqualLogic's. Just because I'm curious; do you have any hypothesis to what could cause the diffrence?

The latency thing being clear I've only 1 question remaining:

Why is the 6210 (connected at multiple 1 Gbit ports) not limited to 1 Gbit troughput as the 6100 is. Which (as you mentioned) is awkard behaviour in our case?

- Henk

hkraal

41 Posts

0

September 15th, 2014 01:00

I've been out of office for a couple of days so I hadn't tested it yet but:

You're right again. Doing the same test on the 6100 yields in a 1Gbit+ write speed, to be exact:

4KB yielded 76 MB/ps troughput with < 3ms latency

64KB yielded 168 MB/ps troughput with < 3ms latency

I think I'm starting to like the 6210... very much ;)

Thank you for the help. I hope that Dell is aware off the very helpfull information you're providing here, if not please sent me your managers address, I assume you can find my info trough the case number (SR900308914) ;)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

September 15th, 2014 08:00

Thank you for the update and the compliment. I'm glad I was able to assist you.

Regards,

View All

No Events found!