Poor disk performance

Question

We have a two nodes Microsoft Windows Server 2012 Hyper-V cluster using a dual SAS controller MD3200. Everything is working fine in the cluster, but the disk performance seems to be very por.

We have defined three disks groups:

- First is a RAID-5 with four SAS-II 15K 3.5" disks; we have defined here only one SAN virtual disk, and we have formated it with CSV and assigned it to the cluster. This SAN virtual disk hosts six fixed size VHDs with the system partition of six Windows Server 2008 R2 VMs.

- Second is a RAID-1 with two SAS-II 15K 3.5" disks; we have defined here only again one SAN virtual disk, and we have formated it with CSV and assigned it to the cluster. This SAN virtual disk hosts two fixed size VHDs with the data partition of two Windows Server 2008 R2 VMs. This servers are database servers, and needs the best performance we can get. The second one server is defined as a standby server, so we can say that any time this SAN virtual disk has only one VHD working.

- Thrid disk group is a RAID-5 with two NL-SAS 7k2 3.5" disks; we have defined here four SAN virtual disks, all formated with CSV and assigned to the cluster. We have defined here same VHDs with less working requeriments.

The problem is that the data VHD of the main database server are performing very bad talking about access to disk, so we try to test and make some changes. We made HD tune tests moving this VHD to the SAS RAID-5 disk group, to the SAS RAID-1 disk group, to the NL-SAS disk group and to one ISCSI LUN assigned from one low-cost SATA NAS used for backups (surprise: this was the best HD tune results by far!!!). The tests were done with no workload in the system (all servers ruynning, but without users and without processes running).

I'm including the HD tune test results.

Dell support is telling us that this IOPS results are normal, but I think that they are very poor, and I can not find any explanation to the comparative of the dell SAN tests results against the SATA NAS tests result.

Can someone please confirm me that this is the best that the MD3200 can do it? If I must expect better IOPS results, any idea of what can be going wrong with our setup?

Thanks.

1) NL-SAS RAID-1 SAN:

2) SAS RAID-1 SAN:

3) SAS RAID-5 SAN:

4) SATA RAID-5 NAS:

Thanks.

Dallas Maverick · Answer

Hi: Nobody has information about this performance results of the MD3200? I need some help with this. Thanks in advance.

Dallas Maverick · Answer

OK. I must conclude that dell MD3200 is as bad product as it looks. Thank you for nothing.

DELL-Kenny K · Answer

Dallas Maverick,

I am very sorry for the issues you are running into. On the MD3200 SAS array there are not many things to do in the area of performance tuning. I am adding in the deployment guide for this array just so you have it if you need to refer to it.

ftp.dell.com/.../powervault-md3200_Deployment%20Guide_en-us.pdf

I had a couple questions as well. What kind of servers are you using and what slot in those servers is the SAS card plugged into? Also which SAS card are you using?

muabdib · Answer

Hi Kenny:

Thanks for your answer. Main question is that I'm not sure if this tests rersults are normal for this system. I'm supposing that they're not normal, basically because the results for the 4 disk RAID-5 Volumen are almost the same as for one stand alone disk, and because I think it's impossible that one iSCSI NAS with standard SATA disks can performs better than a SAS SAN with SAS disks.

Answering your questions:

- Servers are both Dell R620.

- Each server has two dual-port dedicated SAS HBAs (Dell HBA SAS 6 Gbps external). Each HBA in each server is connected to one port of one of the two array modules, so each each server is connected to the two array modules.

- HBA cards are plugged in the PCI slots 0 and 1 of the two servers.

muabdib · Answer

One update: PCI slots are numbered 1 and 2 in the servers chasis. PCI slot 1 is 8x, and PCI slot 2 is 16x ijn both servers. All HBAs are PCIe x8.

muabdib · Answer

Hi Kenny: Do you have some other information or question that can help me to solve this issue. I'm desesperate with this problem, and Dell support is not help me.

sketchy00 · Answer

I might suggest that you run IOMeter to do your testing. This doesn't discount your results from HD Tune, but rather, gives results that might be more easily compared. Also, IOMeter will account for "Alignment of I/Os" Many benchmarking tools were written for local physical disk, and the sector boundaries can lead to inconsistent results. IOMeter allows you to adjust this.

Performance with spindles is essentially a math problem, but also factoring in rotational an transport latencies.

My post on using IOMeter might be helpful

vmpete.com/.../iometer-as-good-as-you-want-to-make-it

muabdib · Answer

Ok. I'll try with this tool and publish here the results. Thanks.

DELL-Kenny K · Answer

As long as you follow the guide and everything is cabled properly there isn't much to do as far as performance tuning being it is the Direct attached SAS connections. If it were a iSCSI array there would be more things to check but with SAS its more just on the physical side. I look forward to seeing what results you get from IOMeter.

iverson_it · Answer

The Dell Powervault MD3200 series are definitely capable arrays. We are currently running 2 of them connected to 2 R710 Servers and 1 R720 Server and they fly. I reckon it will be a configuration or cabling issue. Also suggest for you to update the firmware to the latest generation 3 and connect 2 cables from each server to the storage and enable mpio.

In this configuration, we are able to saturate the SAN with 1200 MBps per second using iometer.

Also you seems to be running to many servers on very low spindle count. Looking at your first Disk Group, you said that it had 4 disk in Raid5 which means 3 disks usable with 1 as parity and you have 6 Windows Server running on them. Even with the basic maths, it would mean 2 OS for every 1 Disk before hitting some penalty with the parity. Next is your raid 1 with 2 Servers, good for reads but in this case still being 1 disk for 2 Servers. ( Plus you said they are database Servers ). If you really need best performance, it wouldn't be recommended to use raid 1 for database but raid 10 instead.

Best thing to do would be to buy more spindles ( SAS not NL-SAS ) and create a dynamic disk pool ( Generation 3 Firmware )

Hope that helped.

David

muabdib · Answer

Hi David. Thank you for your answer.

We have two cables from each physical server to the MD3200, one for each module, so the two servers are connected to the two MD3200 modules.

We also have MPIO enabled with "Least Queue Depth". Failover Clustering is working fine.

All firmwares (Modules, NVSRAM, disks) are the newest. Storage and modules firmware version are 07.84.47.60, and NVSRAM version is:N26X0-784890-004.

Last weekend we have made a change in the RAID distribution: we have deleted all SAS DGs and we have created a unique DG with six SAS disks (7th for hot spare). We also change the VMs distribution so in this new DG there are only three VMs (two terminal servers and the main database server). With this new configuration we only have a 15 % more of IOPS in the main database server, so the performance continue to be very poor.

I suppose that buying more disks could be a way to go, but taking into account the little improvement we have with two more disks and three less VMs we would need to buy too much disks, and I think that with this configuration that we have now we must have more performance, so I'm going to keep on looking for some issue in my configuration or some hardware trouble.

One question: I have note that in the MPIO configuration of each volumen in each server there are shown two paths, one configured as Active/Optimized and the other one configured as Suspended or Disabled (I don't know the exact word shown in English because my MPIO is in Spanish). The path configured as Active/Optimized shows normal IOs and bytes counts, but the other path shows 0 IOs and 0 bytes. I have tried to manually change the status of the second path to Active/Optimized too, but as soon as I click on apply it turns back again to Suspend. Is this the normally behaviour of the Least Queue Depth MPIO policy? In theory it would be using the two parths for IOs operations, so the counts of the two paths woubd be greater than 0. It is not a problem with the HBAs because if I look into the MPIO configuration of another volumen in the same physical server the paths stataus and counts are inverted, so the two HBAs are working, but only in one volumen each one.

iverson_it · Answer

Is there a reason why you would be using Least queue depth instead of Round Robin?

We are currently using Round Robin and we have excellent performance.

Have you tried using Iometer to troubleshoot further?

PowerVault

Poor disk performance

Was this post helpful?