In our case, it seems a bug of some Linux distributions (Our OS Server is Red Hat 5) when Locks handle and it can lead to a situation of deadlock and even block access to NFS threads.
Engineering provided a workaround that seems solved the problem.
As Rainer said, please call support to open a service request to confirm if the workaround apply in your case.
I've asked our VNX admins: there was indeed an update of the VNX software end of September.
This might fit to the time when the problems have started, but it's not sure: My colleagues thought of a "client" software problem at first and used some work arounds. It took several weeks until they talked to me...
@Rainer:
Thank you for the hints. We'll check these and give you some feedback.
I am sure you have probably sought out resources such as I provide below, but still felt like I should provide what I found in doing some basic research online.
I am providing this info without any warranty of its viability, but I am hopeful it will help you.
We've restarted the data mover on the VNX and installed the most recent updates for CentOS6. I don't like this way of "bug fix", but the systems are working properly now - at least for the last 7 days ;-)
Hopefully the client update(s) have solved the problem, but I'll have to see how things behave in the next days/weeks. Unfortunately it takes some time to see if we're running into the same problem again. In case the locking problem occurs again, we'll open a service request. I'll keep you up to date....
a short update from my side: I've seen the problem again, it took nearly one month to show up.
But now one of most recent linux kernel updates (2.6.32-573.10.1.el6) contains a bug fix for an nfs locking problem inside of the kernel, maybe this solves our problem as well:
- [fs] NFS: Hold i_lock in nfs_wb_page_cancel() while locking a request (Benjamin Coddington) [1273721 1135601]
But: it will take at least an other month to see if it is solved or happens again....
Peter_EMC
674 Posts
0
November 13th, 2015 04:00
Was anything changed on the VNX, f.e. code upgrade?
Rainer_EMC
4 Operator
•
8.6K Posts
1
November 13th, 2015 13:00
one way to reset the servers lock status would be to reboot the data mover.
I assume you have already checked the data mover logs for errors.
There is a service command to dump lockd stats and lock but you would have to open a service request
pacoag
1 Message
0
November 16th, 2015 00:00
Hi Kudie
In our case, it seems a bug of some Linux distributions (Our OS Server is Red Hat 5) when Locks handle and it can lead to a situation of deadlock and even block access to NFS threads.
Engineering provided a workaround that seems solved the problem.
As Rainer said, please call support to open a service request to confirm if the workaround apply in your case.
kudie
4 Posts
0
November 16th, 2015 00:00
Hello Peter, hello Rainer,
I've asked our VNX admins: there was indeed an update of the VNX software end of September.
This might fit to the time when the problems have started, but it's not sure: My colleagues thought of a "client" software problem at first and used some work arounds. It took several weeks until they talked to me...
@Rainer:
Thank you for the hints. We'll check these and give you some feedback.
Best regards,
Markus
Rainer_EMC
4 Operator
•
8.6K Posts
0
November 16th, 2015 17:00
IMHO a client bug is more likely – the VNX lockd code has been quite stable over the last years
kparrotte
40 Posts
1
November 16th, 2015 19:00
Hi...
I am sure you have probably sought out resources such as I provide below, but still felt like I should provide what I found in doing some basic research online.
I am providing this info without any warranty of its viability, but I am hopeful it will help you.
https://wiki.archlinux.org/index.php/NFS/Troubleshooting
The following is for Centos7, but seemed like it was at least a "relative" of what you are describing:
0009631: AUTOFS - NFS status monitor for NFS v2/3 locking - CentOS Bug Tracker
CentOS 6: NFS Locking Problem | Field Notes of a Sysadmin
Best of Luck... If you have some success, please let us know.
kudie
4 Posts
0
November 23rd, 2015 03:00
Hi everybody,
thank you very much for the hints and links.
We've restarted the data mover on the VNX and installed the most recent updates for CentOS6. I don't like this way of "bug fix", but the systems are working properly now - at least for the last 7 days ;-)
Hopefully the client update(s) have solved the problem, but I'll have to see how things behave in the next days/weeks. Unfortunately it takes some time to see if we're running into the same problem again. In case the locking problem occurs again, we'll open a service request. I'll keep you up to date....
Best regards,
Markus
kudie
4 Posts
0
December 17th, 2015 23:00
Hello,
a short update from my side: I've seen the problem again, it took nearly one month to show up.
But now one of most recent linux kernel updates (2.6.32-573.10.1.el6) contains a bug fix for an nfs locking problem inside of the kernel, maybe this solves our problem as well:
- [fs] NFS: Hold i_lock in nfs_wb_page_cancel() while locking a request (Benjamin Coddington) [1273721 1135601]
But: it will take at least an other month to see if it is solved or happens again....
Best regards,
Markus