NFS3: File locking problem

Question

Hello, we've a NFS3 share on a VNX which we've mounted on some Linux clients (CentOS6). This was running fine for more than one year. Unfortunately I now get file locking errors on the clients. It looks like that the number of errors increases when there's a 'heavy' NFS3 load: At 1st users cannot got a lock on single files, later on they get problems on more files. After a reboot NO locks are available at all! Enabling the debugging options shows that the servers answers requests or function calls e.g. with 'server returns status 33554432' and 'clnt proc returns -37'. So far it looks like a problem of the client, but changing the clients IP adress makes things working again! Has anyone an idea how to solve this? Maybe there's a possibility to 'reset' the server lock state(s)? Thank you, Markus

Peter_EMC · Answer

Was anything changed on the VNX, f.e. code upgrade?

Rainer_EMC · Answer

one way to reset the servers lock status would be to reboot the data mover.

I assume you have already checked the data mover logs for errors.

There is a service command to dump lockd stats and lock but you would have to open a service request

pacoag · Answer

Hi Kudie

In our case, it seems a bug of some Linux distributions (Our OS Server is Red Hat 5) when Locks handle and it can lead to a situation of deadlock and even block access to NFS threads.

Engineering provided a workaround that seems solved the problem.

As Rainer said, please call support to open a service request to confirm if the workaround apply in your case.

kudie · Answer

Hello Peter, hello Rainer,

I've asked our VNX admins: there was indeed an update of the VNX software end of September.

This might fit to the time when the problems have started, but it's not sure: My colleagues thought of a "client" software problem at first and used some work arounds. It took several weeks until they talked to me...

@Rainer:

Thank you for the hints. We'll check these and give you some feedback.

Best regards,

Markus

Rainer_EMC · Answer

IMHO a client bug is more likely – the VNX lockd code has been quite stable over the last years

kparrotte · Answer

Hi...

I am sure you have probably sought out resources such as I provide below, but still felt like I should provide what I found in doing some basic research online.

I am providing this info without any warranty of its viability, but I am hopeful it will help you.

https://wiki.archlinux.org/index.php/NFS/Troubleshooting

The following is for Centos7, but seemed like it was at least a "relative" of what you are describing:

0009631: AUTOFS - NFS status monitor for NFS v2/3 locking - CentOS Bug Tracker

CentOS 6: NFS Locking Problem | Field Notes of a Sysadmin

Best of Luck... If you have some success, please let us know.

kudie · Answer

Hi everybody,

thank you very much for the hints and links.

We've restarted the data mover on the VNX and installed the most recent updates for CentOS6. I don't like this way of "bug fix", but the systems are working properly now - at least for the last 7 days ;-)

Hopefully the client update(s) have solved the problem, but I'll have to see how things behave in the next days/weeks. Unfortunately it takes some time to see if we're running into the same problem again. In case the locking problem occurs again, we'll open a service request. I'll keep you up to date....

Best regards,

Markus

kudie · Answer

Hello,

a short update from my side: I've seen the problem again, it took nearly one month to show up.

But now one of most recent linux kernel updates (2.6.32-573.10.1.el6) contains a bug fix for an nfs locking problem inside of the kernel, maybe this solves our problem as well:

- [fs] NFS: Hold i_lock in nfs_wb_page_cancel() while locking a request (Benjamin Coddington) [1273721 1135601]

But: it will take at least an other month to see if it is solved or happens again....

Best regards,

Markus

VNX

Was this post helpful?