Unsolved

This post is more than 5 years old

6 Posts

219645

January 22nd, 2014 16:00

Dell Equallogic failover testing Best practice.

Hi All, 

I want to know what is the best practice to do failover testing with dell equallogic array PS 6110  connected to two 10 gb powerconnect switches and connected to 4 vmware esxi 5.5 hosts. 

Please share your thoughts as i have seen some really strange results when i unplugged the 10 gb cable connected from one host to the switch . i have two 10 gb nice on my server and they are connecting to redundant switches and two  iscsi vmkernel ports are created and used for redundancy purpose .  

6 Operator

 • 

9.3K Posts

January 22nd, 2014 20:00

What kind of inter-switch-link do you have? Are the switches stacked or do you use a LAG?

6 Posts

January 23rd, 2014 15:00

Hi Donald,

So we have two servers r620 connected directly to 1 10gb powerconnect switch . and the storage is connected directly to the powerconnect switch .

issue .

VM running on esxi 5.5 host lose ping connectivity when we unplug 1 network cable from the 10 gb port on the host . we have two vmkernel port created on the host and they are redundant .

Yes the i/o will momentarily pause But when we are using Dell eql mem . why it is taking to do the failover after a long time . it should be quick and we should not be loosing 20 pings to the VM.

6 Posts

January 26th, 2014 18:00

Hi Donald ,

Appreciate your input on this .

Dell engineer who has deployed equallogic storage has done the below failover testing . I am surprised that where does it says that this is a valid failover testing and this has resulted in VM crash .

- R620 connected to 10gb swithc 6 vms running and two iscsi vmk ports on the host .

- connected to only 1 10 gb switch and connected to only 1 equallogic storage .

- all the vms were up and running and then the engineer pulled 1 10 gb ethernet cable out put it back within 2 mins and then pulled the other 10 gb cable out and we have lost all access to storage .

Why did we lost access to storage when 1 10gb link was already in the server ?

also what is the equallogic proper failover testing done by dell and you suggest customers to test ?

what is the time it will take an esxi 5.5 host to failover to another port and then also to failback when the link will come back up . ?

i will wait for an update from you .

6 Posts

January 27th, 2014 15:00

they have unplugged network cable from nic 1 and after two mins they plugged the cable back .

After that they have plugged the cable out of the other network adapter after two mins .

At that time the datastores become in accesible as they hot an APD . All path down situation .

I am wondering why Dell equallogic does not say anything about pulling the network cable out of the server as part of their failover testing when their engineers are doing this onsite .

Esx i 5.1 and 4.4 vmkernel ping response is changed and if the storage is unaware of doing that failover for login process than i am sure that is why it is causing the issue from the equallogic side as the array is taking its time to respond back to the initiator .

Can you test this in your test lab and validate.

- use a dell server with two uplinks .

- associate each uplink with 1 vmkernel port and build few vms on the host and when the vms are writing to disk pull a network cable from the host and you will see that the i/o will freeze and it will resume from the other vmkernel port . (Question is how will the array determine that the initiator has failed and how long will it take to determine that .)

(Also in esx i 5.1 and 5.5 the vmk ping is blocked by default so how equalogic will make that call to failover )

No Events found!

Top