Unsolved
This post is more than 5 years old
25 Posts
0
1147
NFSv4 - Dynamic Pools
Hi!
A customer connects a host "Oracle Linux 6.9" using NFSv4 to a cluster running OneFS 8.0.1 through a Dynamic Pool.
When the node where host is connected is rebooted, the mount stucks and the we see the following messages on /var/log/messages:
Oct 11 14:36:25 xxxxxxxx kernel: [4560064.202013] nfs: server server.domain.com not responding, still trying
Oct 11 14:36:25 xxxxxxxx kernel: [4560064.315006] nfs: server server.domain.com not responding, still trying
Below is the content of the /etc/fstab file:
server.domain.com:/backup /backup nfs4 rw,hard,intr,rsize=65538,wsize=65538,timeo=5,retrans=4 0 0
The host must be rebooted to be able the access exports again.
Anyone have issues like that using NFSv4 with DynamicPools?
Thanks in advance!
crklosterman
450 Posts
0
January 5th, 2018 06:00
NFSv4 is a stateful protocol, so as-such it expects session state to be maintained. Now truthfully the answer here is how the mountd client on the OS in question actually handles a disconnection of an NFSV4 mount. In a perfect world because it's a stateful protocol it should be in a static pool, realize the IP is down, do a new NSLOOKUP, and connect to another IP address. That's the way windows handles it over SMB (again a stateful protocol). But may linux OSes still perform mounts in a v3 fashion where once they do an nslookup once and get an IP address back, they hold onto that IP address for dear life. So if it's in a dynamic pool and fails over you'll lose session state info, but the connection won't go stale. So I guess which is worse a stale mount or loss of session-state info?
BTW you haver rsize and wsize windows of only 64K there. Isilon by default supports clear up to 1M, so you may want to increase those if you have larger files.
Also you are mounting server.domain.com? It should be smartconnectzonename.domain.com:/ifs/backup or something like that. I'm sure that was just put there as an example, however just to be sure you're not connecting to the cluster name itself, which usually has no proper DNS registration.
~Chris