Before I start, I'm not referring to just a link in an LACP or FSN config, we get that. This is specific to a situation (yes, rare I know) whereby multiple physical paths become inactive or failed (not just because of low traffic but disconnected) and maybe the datamover is connected to dozens of physical switches, so, you've got one interface that is DOA (no live paths at all) yet the other interfaces are all physically and logically fine. I've seen plenty of questions asked and answered in other posts about this topic, most of them older. Last I tested this(years ago) it didn't cause a failover which is obviously by design and the default "setting", but I'm wondering if there is some hidden or otherwise undocumented parameter than can be toggled so that if an interface loses all/both physical paths making up the interface (devices both show as dead, down, unreachable, etc.) that the datamover would assume this is major problem on those paths and thus failover to the standby DM in hopes of finding at least 1 workable device under either the LACP or FSN teams when they come up on the standby DM?
That's a long-winded way of asking but our VMware admins have nagged us about this since if you lose both "storage" paths on a particular esxi host it will say "whoa, I'm done" and commence failover to a node which is working. Scoured the best practices for VMware over NFS, the CLI for the latest VNX's, and nothing seems to bubble up in any searches either.
I think you could make a cron job in an administration host to check thouse paths and if they are down trigger a command via ssh to the control station to failover the datamover.
nothing will be iin the logs if something is broken upstream. Your physical links could be just fine but upstream router is hosed.
Event only helps if the data mover interface itself shows link down – not if an intermediate switch/router is broken
I think all are possible solutions, nothing as simple as setting a flag to 1 to trigger a failover or 0 for default
The situation we are thinking of comes into play where if you have a limited number of physical slots in the datamover, and limited options for splitting your physical paths across separate physical NICs in the blade. So, for the VMware stuff over NFS example, isolated VLAN, just a pair of switches in the path, no routers. Some body happens to down your 2 active ports but everything else is fine, first the physical ports down, then the interface downs, so the trigger is simple - if the interface is down, then do the failover. What would be bad though is if the interface on comes up on the standby DM and then it starts to ping pong. Maybe I just answered why this isn't a desirable param to set, although a programmer could also put in an option to say "don't failback from standby" which would prevent that from being a problem.
We've not ever customized any nas_events, but the idea looks like it may be able to trigger the failover, but I can't see how it would be able to be tweaked to prevent the ping pong situation.
when say interface, you are referring to local interface on datamover or the actual physical NIC on datamover ? Because i don't think interface goes down when physical link changes state to "link down". Don't think it makes any in difference in the context of what you are trying to do.
2 ports 10Gb optical in LACP. From our observations, a physical device flapping creates it's own events. Plus if you are reconfiguring LACP on the switches, I recall seeing the logical interface also up/down depending on what's happening on the switch ports. Could have been seeing things incorrectly though.