Start a Conversation

Unsolved

This post is more than 5 years old

1144

January 5th, 2018 03:00

NFSv4 - Dynamic Pools

Hi!

        A customer connects a host "Oracle Linux 6.9" using NFSv4 to a cluster running OneFS 8.0.1 through a Dynamic Pool.


         When the node where host is connected is rebooted, the mount stucks and the we see the following messages on /var/log/messages:


Oct 11 14:36:25 xxxxxxxx kernel: [4560064.202013] nfs: server server.domain.com not responding, still trying

Oct 11 14:36:25 xxxxxxxx kernel: [4560064.315006] nfs: server server.domain.com not responding, still trying


     Below is the content of the /etc/fstab file:


server.domain.com:/backup      /backup           nfs4    rw,hard,intr,rsize=65538,wsize=65538,timeo=5,retrans=4 0 0


        The host must be rebooted to be able the access exports again.


         Anyone have issues like that using NFSv4 with DynamicPools?

Thanks in advance!

450 Posts

January 5th, 2018 06:00

NFSv4 is a stateful protocol, so as-such it expects session state to be maintained.  Now truthfully the answer here is how the mountd client on the OS in question actually handles a disconnection of an NFSV4 mount.  In a perfect world because it's a stateful protocol it should be in a static pool, realize the IP is down, do a new NSLOOKUP, and connect to another IP address.  That's the way windows handles it over SMB (again a stateful protocol).  But may linux OSes still perform mounts in a v3 fashion where once they do an nslookup once and get an IP address back, they hold onto that IP address for dear life.  So if it's in a dynamic pool and fails over you'll lose session state info, but the connection won't go stale.  So I guess which is worse a stale mount or loss of session-state info?

BTW you haver rsize and wsize windows of only 64K there.  Isilon by default supports clear up to 1M, so you may want to increase those if you have larger files.

Also you are mounting server.domain.com?  It should be smartconnectzonename.domain.com:/ifs/backup or something like that.  I'm sure that was just put there as an example, however just to be sure you're not connecting to the cluster name itself, which usually has no proper DNS registration.

~Chris

No Events found!

Top