Suggestions on network failover?

Question

Since I've had differing opinions on this from our solutions architects, I thought I would pose this question to the forums and see what every thinks/does. We're just about to go into production on our Celerra NX4 array, and we're doing our final failover tests. We have the following setup configured for the primary DM:

cge0, cge1, cge2 are all configured as solo iSCSI links (2 on our primary SAN VLAN, 1 on a secondary SAN VLAN we set up just for this array at EMC's request. cge3 is a dedicated replication link. We've got 2 GbE switches with static trunks into each other and Rapid spanning tree in place. All 4 links in the primary DM are on 1 switch, and all 4 links on the standby DM are on the other switch. When we simulated a switch failure, nothing failed over to the secondary DM until we kicked it over manually. We don't care about maximum performance, we just need to make sure the array stays up in case of a switch failure. Should I see about setting up LACP or Static Trunks to take care of it, should I just swap the cables for cge1 between the primary and standby DM? Would it be best to set up a Fail-Safe-Network link? Let me know if more detail is required...

Rainer_EMC · Accepted Answer

sorry - I meant Powerlink properYou'll find reference architecture documents for apps like Exchange or SQL Server or others with our without VMware in the Solutions areafor example:Reference Architecture: EMC Solutions for Microsoft Exchange 2007 and Microsoft SQL Server 2005 E-mail, Database, and File Sharing on EMC Celerra NX4 in Home > Solutions > EMC Solutions for Midsize Enterprises > EMC Solutions for Exchange 2007 Some good information about Celerra and VMware is in the Solutions Guide: VMware ESX Server Using EMC Celerra Storage Systems in Home > Support > Technical Documentation and Advisories > TechBooks Solutions Guides http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/H5536-vmware-esx-srvr-using-emc-celerra-stor-sys-wp.pdfIts normal that you will get different suggestions - it very much depends on your config and requirementsThe general goal to survive a switch failure can be solved at either the Celerra (FSN), the switch (stack LACP) or the client (MCS) levelAs always there are pros and cons regarding functionality, simplicity, cost ...

ltfields · Answer

And for clarification, we're only using this array for iSCSI, no CIFS or NFS.

Rainer_EMC · Answer

works as expected and documented - a network link failure isnt a trigger for a data mover failover

what applications are you using and whats the client network setup ?
i.e. are they connected to both switches as well and how ?

normally I would try to use the reference architecture

LACP wont help - unless you have one of the few switches that can build a LCAP trunk between two switches

FSN's would certainly help but halve your network bandwidth

connecting cg1 to the other switch could help, *if* you associate both interfaces to the same ISCSI target and configure both on the client initiator *and* its intelligent enough to switch over

spanning tree doesnt make a difference here

ltfields · Answer

It's a shame a network link failure isn't a trigger for a DM failover, since they have their own internal network between the DMs. I would think that you could set a beacon IP or something, and if the primary DM can't see it, it could make a check with the standby DM. If the standby DM can see it, it would initiate a failover. But I digress...

Applications are an ESX Cluster of HP Proliant Blades with Windows VMs on them that have iSCSI volumes attached. So I have the ESX initiator connected, and the Microsoft initiator inside the VMs (only about 4 VMs out of 35). I do have all 3 interfaces associated with the same iSCSI targets, all 3 target IPs are configured on my initiators, and I have unlicensed PowerPath on the Windows VMs. Would that be enough to just swap the physical cables? And is there a doc somewhere with the "Reference Architecture"?

BillStein-Dell · Answer

It's a shame a network link failure isn't a trigger
for a DM failover, since they have their own internal
network between the DMs. I would think that you
could set a beacon IP or something, and if the
primary DM can't see it, it could make a check with
the standby DM. If the standby DM can see it, it
would initiate a failover. But I digress...

The standby Data Mover is not configured, so there is no IP stack loaded for the network interfaces, so there is no way we can check the network connectivity in that way.

You could possibly script something on the Control Station, but I would advise against that.

Applications are an ESX Cluster of HP Proliant Blades
with Windows VMs on them that have iSCSI volumes
attached. So I have the ESX initiator connected, and
the Microsoft initiator inside the VMs (only about 4
VMs out of 35). I do have all 3 interfaces
associated with the same iSCSI targets, all 3 target
IPs are configured on my initiators, and I have
unlicensed PowerPath on the Windows VMs. Would that
be enough to just swap the physical cables? And is
there a doc somewhere with the "Reference
Architecture"?

Since you don't need the bandwidth, and I don't think you'd be maxing out the bandwidth on any one GbE for an iSCSI LUN, I'd suggest cutting down the number of interfaces for your ESX cluster and implementing FSN. You wouldn't need LACP for bandwidth, since performance isn't your concern. You could implement some trunks (Cisco EtherChannel) if you wanted port redundancy, but to safeguard against a switch failure, I would strongly recommend FSN.

ltfields · Answer

I had a feeling FSN would be the best way to go. I didn't see anything resembling a 'Reference Architecture' doc in a couple of KB searches. Is the Celerra 'Reference Architecture' anywhere specific?

ltfields · Answer

Agreed, I appreciate the insight. As soon as I can get our project manager to call me back (he's on the flaky side), I'm going to get a resource to look at it closely with us and bless the FSN setup for our array.

Rainer_EMC · Answer

for some other tuning for ESX take a look at the TechNote I just postedhttp://forums.emc.com/forums/thread.jspa?threadID=103401

Celerra

Suggestions on network failover?

Was this post helpful?