Unsolved

This post is more than 5 years old

1 Rookie

 • 

55 Posts

743

March 13th, 2007 06:00

Win2k3 Dynamic disks + ESX virtual hosts = converts to basic = data loss

ESX 3.0.1
Windows 2003 SR1 running virtually
Clariion CX700 flare 19

We present the ESX servers with 512Gb luns. They present them to the virtual hosts as raw devices. They stitch them together using dynamic disk. This worked fine for a few months but 4 times in the last 3 weeks these hosts have, in the middle of the day, decided that the dynamic disks are now foreign basic disks resulting in massive data loss. Microsoft is saying it's the SAN due to disk read errors in the Windows logs (highly unlikely with 2 separate FC fabrics and 4 paths to each device we would have to have lost both fabrics at exactly the same time and for long enough for Windows to give up on the writes) Has anyone else seen this? Any thoughts?

410 Posts

March 13th, 2007 06:00

make sure that you have correct and latest driver AND firmware on the HBAs.
intermittent disconnections can mosly be due to this.

EMC is likely to advise against dynamic disks...

has the ESX host logs been looked into? was there a performance issue when the disk errors occured inside windows?

4 Operator

 • 

2.1K Posts

March 13th, 2007 06:00

A question for Kiran here though...

Why is EMC likely to advise against Dynamic Disks? This is something I've never heard from them. We don't use them inside client hosts on ESX, but we use them extensively on native Win2K and Win2K3 hosts.

March 13th, 2007 06:00

decided that the dynamic disks are now foreign basic
disks resulting in massive data loss.

i'm not an expert on MS matters but this does not sound right. best to my knowledge only dynamic disks can be foreign. converting to basic results in data loss sure, but i can't see how this can happen without an administrative decision.
anyway, if the use of dynamic disks is not really mandatory I'd stay with basic disks on SAN. also performance-wise, it should be far more efficient to use metaluns on the clariion.

4 Operator

 • 

2.1K Posts

March 13th, 2007 06:00

I'd tend to agree with Ain. I'm not sure what is causing the problem, but one good solution would be to put the pieces together in a MetaLUN before presenting it to the host. This way the host will not have to deal with Dynamic disks and they still get the advantages of the striping (or concatenation) that they want.

410 Posts

March 13th, 2007 07:00

Even I am not sure about the root reason...but EMC likes to see basic disks on Windows hosts and MetaLUNs on array. Basic disks can still be expanded using diskpart

March 13th, 2007 07:00

not sure if this helps but EMC recommends adding (if not present) a TimeOutValue DWORD key to the hive HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\ in registry and set it to 60 seconds (3c in hex).

4 Operator

 • 

2.1K Posts

March 13th, 2007 07:00

Can you point to any documentation to support this. I'm not disagreeing, I just want to know if they are going to pull out some obscure document some day and say " Well, too bad about your data, but you should have followed this rule".

I don't remember ever seeing anything about this in the CLARiiON Best Practices documents.

1 Rookie

 • 

55 Posts

March 28th, 2007 10:00

They were using Dynamic disk because we had presented the ESX cluster a group of 512Gb luns and the Windows server admin wanted one device so they stitched them together with dynamic disk to make a 1Tb lun. We didn't want to give them 1Tb luns meta or otherwise because if you start having a performance issue with a lun that big it takes a week to migrate it to some other part of the clariion (hey EMC where's my in frame virtualization so I don't have to keep track of all this??)

1 Rookie

 • 

55 Posts

March 28th, 2007 10:00

ESX has been looking into the logs. Don't know that they found anything. No issues on the Clariion or the FC fabric when this event occurred.
No Events found!

Top