Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

51818

July 2nd, 2010 12:00

MD3000i performance...

We have an MD3000i with 15 300GB 15K SAS drives. Right now we're just using 4 of the drives and running approximately 15 Hyper-V VMs off them. SAN is used for nothing else. One disk group with one LUN is where the VMs are residing. The VMs are nothing too intensive. Three (one dev and two production) SQL servers that are pretty low load, an Oracle development server with one person using it, two ColdFusion servers (one dev and one production) one or two IIS servers and my VMM server is running as a VM but  I might reinstall on a physical server.

Anyways I consider my environment pretty low load and the number of virtual machines relatively low and I don't expect the number (15) to grow by much more in the future. Possibly we may virtualize our desktops in the future but probably they will be stored on another MD3000i or an MD1000.

So to get to my point here what I'm getting performance wise. The top two charts are based on a 5 hour runtime during normal business hours. The lowest graph is currently what it is doing during normal business hours and varies a little. But not too much. Latency basically getting 100ms average. I've heard anything over 25ms is not too good. At least a review of the Solarwinds product said over 25ms was bad for a SAN. My IOPS figure I don't know if it's good or bad as we're pretty low load. Anyways my thoughts are my problems are the low number of spindles, it being a RAID 5 in a very high write environment, and only 1 LUN so only one controller is even being used. My thoughts are to make the SAN have two RAID 10's (using 12 disks) and maybe two LUNs per disk group. I don't like the idea of one disk group myself or big LUNs.

Thoughts?

 

 

847 Posts

July 6th, 2010 13:00

Take the IOPS do RAID6.....  

 

The latency may not be all to related though....  

 

Check to make sure cache is running properly on all your VM's,  make sure iSCSI targets are all what they should be.   Not real sure on Hyper-V but dell has there own MPIO drivers which could make a difference.

 

We are pounding our MD3000i's and never see 100ms, especially since going to vSphere. (esx 4.0)

 

 

15 Posts

July 5th, 2010 21:00

Two separated raid10 disk groups is a good idea (one on each controller). Moving databases to other disk group is good too. Latance is high, you should check your hardware (network) and hba's configuration. 

July 7th, 2010 09:00

What do you mean "take the IOPS do RAID6"? You mean accept the bad random writes I'm gonna get in RAID6 and get more spindles? What worries me about that is that you can see we are reading less than 50% of the time easily. Not sure what you mean by the cache running properly on the VMs. The iSCSI targets show ok. I've checked them in the past and nothing has changed. Remember since I only have one virtual disk I'm only utilizing one controller now. I think I'm using the Microsoft MPIO drivers but not sure. I don't know how much to make of performance statistics. I was in on the weekend when our environment would basically be almost dead. I needed to do large file copies (15GB to 30GB ) as I wanted to back up 2-3 VHDs to some local drives on one of our nodes. When I was doing that the IOPS jumped up alot to around 500-1000 IOPS if I remember) and the latency dropped to 30-40ms if I remember right. Now this was copying single files and is not a realistic situation during normal business hours. Could the performance figures shown be because we're not really utilizing things much? Meaning that we're only doing say 40 IOPS during normal business hours? Remember we've got around 15 VMs, very low load, running on the MD3000i.

27 Posts

July 9th, 2010 09:00

With all 15 of your HyperV VM's on a single 4 drive RAID group, it is highly likely your storage system is spending all its time servo-ing from one data area to another with resultant hit on latency while servicing random data requests.  Also with a single LUN you are sending all I/O through a single controller (each virtual disk is owned by a controller, with alternating assignments - which you have not enabled).  To solve this issue one can

  • Create more RAID groups to spread out I/O activity over more disks and controllers.  Create a virtual disk per RAID group.
  • Assign less VM's per RAID group / virtual disk
  • Group high activity VM's and low activity VM's on a single virtual disk, minimizing performance impact on the high activity VM
  • Dedicate a RAID Group and single Virtual Disk for your highest activity, most mission critical VM.
  • Never put multiple sequential access databases on the same disk group.  (Reduces random access while servicing data requests from diverse databases)

 

847 Posts

July 9th, 2010 15:00

High latency is not usually an indicator of low load.

 

Just remember a lot more spindles means faster writes as well.    I have never found 1/2 the spindles in a riad 10 group to be as fast as twice that amount in a raid 6 / 5 group.  IE: 14 drives in a raid 10 only gives the IOPS of 7 drives.

 

Twice the spindles in a RAID 10 group would be best but is usuallyn ot practical because of all the drives you need to do it.

 

 

 

July 10th, 2010 15:00

My performance problem was because the Write Cache on the virtual disk was enabled but "(currently suspended)". This was seen in the Virtual Disks tab in MDSM. What made me look there was this article. The IBM SAN and the MD3000i come from the same LSI array. We have dual controllers though unlike the author of the topic. I called up Dell and we tried multiple things mentioned in the article to get the write cache out of the suspended mode. We tried disabling and reenabling, pulling out a controller, etc. Finally it was suggested to totally power down the SAN and then power it back up to get the SAN to realize it shouldn't have the write cache in suspend mode. This resolved it. Now latency is much better and same for IOPS when the VMs are actually doing something.

http://vmtoday.com/2009/06/ibm-ds3300-iscsi-write-performance-solved/

 

JOHNADCO,

 

You mention the RAID5/6 RAID10 comparison. From performance figures I've run, when the write cache was in suspended mode,  I was seeing less than 50% read. This seems to me a case where RAID 10 would be better as I would think with the SAN being populated with only big VHD and VM config files that the I/O would be random.  I like the idea of RAID10 or RAID6. And if I take say 8 drives and put em in a RAID10 I "lose" 4. Those same drives in a RAID 6 I "lose" 2.

So the question is performance wise for virtual machines would eight 15K SAS drives in a RAID6 outperform those same drives in a RAID10. So not sure if 2 more spindles is enough to make up for the bad write performance I'm sure we'll get. Not sure though in our environment that performance is that critical where we would see a problem.

847 Posts

July 12th, 2010 09:00

That write cache suspended on thoes single controller MD3000i's is definetly a performance killer.

I usually try to make sure it has been checked when people say it really sucks.  :)

 

With 8 drives it may be close to a wash.   With a 14 drives disk group, 14 drive raid 10 -vs- 14 drive raid 5?   In our testing, the RAID 5 kicked it's tail even on writes.  We have since went to RAID 6 and I have not retested it as we are in full production with it now.  Wrtie IOPS definetly dropped when going to RAID 6 from RAID 5.

July 14th, 2010 10:00

Yeah but we're on dual. Not sure how it got into that suspended state.

Yeah I think I'd go RAID 5 if not for the fact that I'm worried about being vulnerable during the rebuild time. Granted RAID 10 same thing in terms of being vulnerable but I believe the rebuild time would be faster and you basically just have better odds that your the next drive that you may lose won't cause loss of data. Whereas with RAID 5 the second drive would definitely cause a loss. This is why RAID 6 or RAID 10 are my options I think. And in this scenario I could.

1. Create two 7 disk RAID 6 disk groups.

2. Create one 6 disk RAID 6 disk group and one 8 disk RAID 6 disk group.

 

Those seem to be the two options I would go with if I went RAID 6. I gain basically 1-2 more spindles per disk group compared to RAID 10. Have the assurance that 2 drives could fail in the same disk group and no loss of data. Also since we have 300GB drives we have 300GB-600GB more space per disk group depending on configuration.

What's holding me back a bit is that from my performance tests with the smCLI tool we are in a 60-70% write environment! Granted very low load. But still mostly writing. Maybe that's because we're so low load it's swap file stuff in the OS VHDs ? Just a wild guess.

No Events found!

Top