Unsolved
This post is more than 5 years old
6 Posts
0
219645
January 22nd, 2014 16:00
Dell Equallogic failover testing Best practice.
Hi All,
I want to know what is the best practice to do failover testing with dell equallogic array PS 6110 connected to two 10 gb powerconnect switches and connected to 4 vmware esxi 5.5 hosts.
Please share your thoughts as i have seen some really strange results when i unplugged the 10 gb cable connected from one host to the switch . i have two 10 gb nice on my server and they are connecting to redundant switches and two iscsi vmkernel ports are created and used for redundancy purpose .
No Events found!


Dev Mgr
6 Operator
•
9.3K Posts
0
January 22nd, 2014 20:00
What kind of inter-switch-link do you have? Are the switches stacked or do you use a LAG?
sydney23
6 Posts
0
January 23rd, 2014 15:00
Hi Donald,
So we have two servers r620 connected directly to 1 10gb powerconnect switch . and the storage is connected directly to the powerconnect switch .
issue .
VM running on esxi 5.5 host lose ping connectivity when we unplug 1 network cable from the 10 gb port on the host . we have two vmkernel port created on the host and they are redundant .
Yes the i/o will momentarily pause But when we are using Dell eql mem . why it is taking to do the failover after a long time . it should be quick and we should not be loosing 20 pings to the VM.
sydney23
6 Posts
0
January 26th, 2014 18:00
Hi Donald ,
Appreciate your input on this .
Dell engineer who has deployed equallogic storage has done the below failover testing . I am surprised that where does it says that this is a valid failover testing and this has resulted in VM crash .
- R620 connected to 10gb swithc 6 vms running and two iscsi vmk ports on the host .
- connected to only 1 10 gb switch and connected to only 1 equallogic storage .
- all the vms were up and running and then the engineer pulled 1 10 gb ethernet cable out put it back within 2 mins and then pulled the other 10 gb cable out and we have lost all access to storage .
Why did we lost access to storage when 1 10gb link was already in the server ?
also what is the equallogic proper failover testing done by dell and you suggest customers to test ?
what is the time it will take an esxi 5.5 host to failover to another port and then also to failback when the link will come back up . ?
i will wait for an update from you .
sydney23
6 Posts
0
January 27th, 2014 15:00
they have unplugged network cable from nic 1 and after two mins they plugged the cable back .
After that they have plugged the cable out of the other network adapter after two mins .
At that time the datastores become in accesible as they hot an APD . All path down situation .
I am wondering why Dell equallogic does not say anything about pulling the network cable out of the server as part of their failover testing when their engineers are doing this onsite .
Esx i 5.1 and 4.4 vmkernel ping response is changed and if the storage is unaware of doing that failover for login process than i am sure that is why it is causing the issue from the equallogic side as the array is taking its time to respond back to the initiator .
Can you test this in your test lab and validate.
- use a dell server with two uplinks .
- associate each uplink with 1 vmkernel port and build few vms on the host and when the vms are writing to disk pull a network cable from the host and you will see that the i/o will freeze and it will resume from the other vmkernel port . (Question is how will the array determine that the initiator has failed and how long will it take to determine that .)
(Also in esx i 5.1 and 5.5 the vmk ping is blocked by default so how equalogic will make that call to failover )