NFSv4 Failover

Question

Hello,

we are implementing a Isilon Solution at the moment, it contains 2 Clusters mirrored with SyncIQ and controlled with Superna Eyeglass.

Did someone have Informations about how the Cluster handles a Node Failure in Case of NFSv4 with a Static Pool and also how to handle the Stale NFS Failures on the Client in Case of Failover to another Cluster.

regards,

Martin

crklosterman · Accepted Answer

In the case of a node failure you would want to use a dynamic smart connect zone for NFSv4 but only on OneFS 8.0 and above. That's going to give you the best possible behavior. Andrew's response was directed at failing over clients between clusters, not within the same cluster.

~Chris Klosterman

Advisory SA

EMC Enablement Team

chris.klosterman@emc.com

Newday3000 · Answer

Hi Shamrok,

We have tested NFS v3 and 4 with Eyeglass for failover and failback to find the best method to avoid stale mounts. Assuming planned failover with source and destination clusters that are healthy. You can unmount -FL force and lazy option.

We have script engine that can ssh to a host and unmount remount exports This example here shows the scripts used to automatically remount post failover and runs inside Eyeglass http://documentation.superna.net/eyeglass-isilon-edition/Eyeglass-Isilon-Edition#TOC-Script-Engine-Understanding-Remote-Execution-to-hosts

This video shows it working

If you have more questions let me know.

Regards

Andrew

Martin0904 · Answer

Hey Chris, thanks for the Information, so now i need to wait for a new 8.x Release because i don´t really want to run Production on a 0.0 Release,.... :-) regards, Martin

crklosterman · Answer

I'm not saying you have to wait for 8.x to use NFSv4, however you'll probably be far happier with the failover behavior if you do. The logic here is that NFSv4 is, unlike v3 a stateful protocol, in that it expects the server to maintain session state information. Since each node runs it's own nfs daemon(s), that session state info is unique per node. Traditionally our recommendations and documentation have always been to use static SmartConnect zones with stateful protocols and dynamic with stateless. In 8.x and above we can now keep that session-state information for NFSv4 in sync across multiple nodes. Most mountd daemons used today also will still behave in a v3 manner, such that if the IP it's talking to goes down you simply get a stale mount, it won't try and do a new nslookup and connect to a different node. SMB clients like Windows Explorer do exactly this, and some NFS clients do as well, but not all. Therefore dynamic is the better choice for NFSv4 because at least you won't get stale mounts even if some session state information goes missing, and 8.x is the better choice because that session state information won't be lost in the first place. The other thing to consider if you haven't turned on NFSv4 on your cluster yet, is that if you haven't in your fstab or automount files specified vers=3 today, if you turn on v4, the next time you unmount and re-mount those exports, most clients will negotiate to the highest supported version of the protocol, so they'll start using v4 without you doing anything. In general that's a bad thing because v4 supports ACLs, the performance is usually different, etc. It's not an Isilon-specific behavior either, it's more about the clients than the server.

Make sense? And yes i'm totally on-board with not upgrading production to a new major code release until it's settled out for a while. OneFS 8.0.0.0 has been very stable as far as I've heard so-far, and for a greenfield deployment that's not yet in production I wouldn't hesitate in using it in the slightest, but for a production upgrade I would personally wait until either the first MR releases, or an MR in the 8.x family reaches the milestone of being 'target code'.

~Chris Klosterman

Advisory SA

EMC Enablement Team

Isilon

NFSv4 Failover

Was this post helpful?