Stale NFS Mounts

Question

Hi, Isilon support community. I have an ongoing problem where I sporadically have NFS mounts that go stale; I am unable to stat the directory where the mount is occurring, and have to take drastic measures to resolve this issue. Do other people experience this same issue? How do people typically deal with this? Just a couple of notes;

1) I am doing the mount across a layer 3 boundary (through a firewall). I am fairly confident the firewall is not the culprit (i.e. sessions expiring in a state table), but wanted to mention this regardless.

2) I am mapping to the DNS name in the NFS entry. Here is the entry from my /etc/fstab:

bulkstorage-nfs.uchicago.edu:/ifs/uofc/cri/group/arabidopsis /group/arabidopsis nfs defaults,nfsvers=3,intr,rsize=1048576,wsize=1048576 0 0

Thank-you so much for your input and guidance as I continue to troubleshoot this issue.

Thank-you,

Dan Sullivan

dynamox · Answer

bulkstorage-nfs.uchicago.edu <- smartconnect zone name ? What if you mount individual isilon node by its IP ?

dsulli99 · Answer

Dynamix,

Thank-you for your reply.

Is this a common practice? I am concerned that this will lose our ability to load balance on the system using round robin DNS? I suppose I can do this to test, but I wouldn't consider it a long-term solution...

To answer your question, yes, it is a smartconnect zone name.

dynamox · Answer

yep, test only ..maybe to isolate if one particular node is having issues. It would be another data point of you could force NFS mount using UDP instead of TCP.

dynamox · Answer

i remember now that we had issues with Linux clients that were running iptables and Isilon was trying to talk to the client using rpcinfo. Isilon techs told us that we had to allow Isilon to talk to our clients so we ended up modifying iptables to allow port 111 (rpcbind) tcp and udp.

dsulli99 · Answer

iptables definitely isn't the issue, it's not running on this box.

I'm also interested in a 'general response' from the community as to whether or not NFS or stale mounts has been an ongoing problem that other people have experienced. dynamox have you been able to resolve this issue permanetly, or just reduce the frequency of occurrence?

Thank-you,

Dan

dynamox · Answer

we did not experience stale mounts but our syslogs on Linux servers were filling up with 'kernel:lockd :server 10.11.12.13 not responding,timeout'.  Isilon support said they need to be able to query these 3 services (RPCBIND,Status,NLOCKMGR) from the cluster in order for rpc locking to work correctly. They would run 'rpcinfo -p client_ip' from Isilon and see if those 3 services were available .

christopher_ime · Answer

I simply wanted to point out that general best practice is to use "hard" mounts (versus soft mounts). I'm not certain if it is the default, but is one that I explictly add to the mount options when mounting a fs via NFS from an Isilon. Then again, it may not be related, but it stood out for me as being absent.

I am probably entirely off-base here.

christopher_ime · Answer

Also, looking a second time at the rsize and wsize, those exceed the (advertised) maximums supported on the cluster.

Have a look at the following KB article:

emc14001361: "Best practices for NFS client mount settings"

dynamox · Answer

are those rsize/wsize values are the same regardless if the client and or Isilon is connected via 10G or 1G links ?

dsulli99 · Answer

I'm fairly certain that hard mounts are the default, although I can't prove it.  Also, the behaivor that I am experiencing suggests that its mounted hard anyway.  I'll try explicitly configuring this just to be thorough.  Thank-you for the suggestion.

dsulli99 · Answer

This is excellent information.  Thank-you.

christopher_ime · Answer

Hey Dynamox,

First let me make note of a KB article I tucked away when I started my journey with Isilon.

emc14000137: "Increase NFS performance with Isilon storage using client tuning"

We recommend setting the value on the host no larger than the advertised maximum sizes on the cluster which is 128KB rsize and 512KB wsize (even though they will eventually negotiate). So instead of specifying 1MB in the client mount options as the OP has and letting it negotiate down to 128KB or 512KB (or whatever value the max_rsize or max_wsize is set to on the cluster), we recommend simply matching the value on the cluster. As was explained to me in the past, despite the sysctl commands on the cluster, you can't specify larger than 512KB. We provide the method to change the values on the cluster to match the largest supported value on the host (but no larger than 512KB). So if you plan to specify 32KB on the host, match that on the cluster with the corresponding sysctl command. Basically we want to reduce/elimate the overhead of the host and system negotiating.

As for your comment about 1Gb vs 10Gb, no this doesn't change anything related to the conversation above; however, what it does influence as you know is the decision to use jumbo frames between server and cluster or as documented in the KB article above, modifying the TCP buffer sizes.

christopher_ime · Answer

Hmmm.. this area is fairly quiet even though it was only recently it went lights on.  There are already quite a few unanswered posts and would like to see the legacy Isilon gurus chip in.  I've been involved with Isilon for several months now, but I also ask a lot of questions (and at times simply regurgitate what I've learned).  The internal forums are buzzing with activity and would love to see that overflow here also.

dynamox · Answer

i agree, Isilon google groups forum has more activity than this one. Maybe you can ping your comrades to pock in here from time to time and help us out so we can all learn.

AndrewChung · Answer

What version of OneFS are you running? Depending on the version, there have been a huge amount of fixes to NFS. The stale mounts might be caused by the NFS daemon on the Isilon dying. Look in /var/crash for core dumps of the NFS process. Otherwise, you should be able to have an NFS mount stay up indefinitely.

In terms of the rsize and wsize, for OneFS 6.5 and below, Christopher is correct. We don't support larger than 512k for writes and 128k for reads. This changes in OneFS 7.0 where we do support 1 MB window sizes. The rsize is 128k due to the fact that 128k is the data stripe size on the Isilon and 512k is a multiple of 128k. Jumbo frames helps performance as well as using TCP instead of UDP (in general).

Isilon

Stale NFS Mounts

Was this post helpful?