Unsolved

This post is more than 5 years old

1 Rookie

 • 

43 Posts

1302

March 31st, 2013 09:00

How to fail DM over on link-down event?

Hi all, we're implementing a unified VNX5500 system.  Block side is working great, file side is functional, but during testing I've found that an all links down situation is not something that qualifies for data mover failover.  I have two 10gig links as an LACP trunk from each DM to separate switches, two DM's with one active one standby.  Obviously if the primary switch develops a failure and goes down, I'd like the DM to fail over.

Is my only option getting rid of the LACP trunk and switching to a fail-safe network with the primary DM connected to each switch, and same with the second?

Thanks!

6 Operator

 • 

8.6K Posts

March 31st, 2013 11:00

Cross-stack (switch) LACP ?

Are you getting more network bandwidth than one 10GB interface worth?

1 Rookie

 • 

43 Posts

March 31st, 2013 14:00

I'm not getting more than 10gbit currently, but I should be; that's a whole other issue lol.  We have lots of flash disk and are not seeing anywhere near the throughput we should whether doing block or file.  Still working on that with support.

The switches are independent so I can't do lacp across them.  If my only option is FSN I can do it.  Maybe I could create my own link down fail-over by running a cron job on the control station that polls the DM's interfaces for link status once per minute and fails it manually if they go down?  Wouldn't be ideal but should still be fast enough for NFS to recover fine.

6 Operator

 • 

8.6K Posts

April 1st, 2013 04:00

You could get your EMC contact to file an RPQ

I think there was a way to get failover on network problems via RPQ - but don't recall the details

1 Rookie

 • 

43 Posts

April 2nd, 2013 10:00

I got confirmation from support that they would not support me adding a cron job to the control station to detect link failures and fail the box over; so that's that lol.  I have a spare 2-port 10gig card so I'm going to throw it in the primary data mover and connect it to the secondary switch, then construct an FSN using the LACP link as primary and the single 10gig link as failover.  I think it's kind of dumb to not support a total network failure as a viable condition to fail the data mover over; not supporting that means you either have to give up ports to make an FSN to different switches or buy more ports that would otherwise not be necessary; and 10gig ports aren't cheap.

In case anyone was interested in doing the cron job thing though because they don't have spare ports, budget, etc., it's pretty simple.  On your control station, run:

server_sysconfig server_2 -virtual -info lacp0

The "lacp0" is what I called my trunk link made out of the two fxg-2-0 and fxg-2-1 ports, you'll have to modify the command for whatever you called it.  The output will be something like this:

server_2 :

*** Trunk lacp0: Link is Up ***

*** Trunk lacp0: Timeout is Short ***

*** Trunk lacp0: Statistical Load Balancing is IP ***

Device     Local Grp   Remote Grp Link  LACP Duplex Speed    

------------------------------------------------------------------------

fxg-2-0    10000       55510      Up    Up   Full    10000 Mbs

fxg-2-1    10000       55510      Up    Up   Full    10000 Mbs

If both links go down, it looks like this:

server_2 :

*** Trunk lacp0: Link is Down ***

*** Trunk lacp0: Timeout is Short ***

*** Trunk lacp0: Statistical Load Balancing is IP ***

Device     Local Grp   Remote Grp Link  LACP Duplex Speed    

------------------------------------------------------------------------

fxg-2-0    10000       55510      Up    Down Full    10000 Mbs

fxg-2-1    10000       55510      Up    Down Full    10000 Mbs

Notice how the device output still shows "Up" even though the links are really down.  Fortunately the second line "*** Trunk lacp0: Link is Down ***" shows up if it's a single link failure, and Down if both (or all) fail.  You could run a script to check once per minute via cron, and then if you detect both links down, execute:

server_standby server_2 -activate mover

which will fail your data mover over.

0 events found

No Events found!

Top