NFS copy issue - source and dest. are differents

Question

Hi everyone, I would like to share an issue we encountered from months. While we copy a file through nfs and we compare md5, it could happen they are not the same. It results with drop when this file is a video file for example. The same copy through samba works (same md5) but is muuuuch longer. I'm just wondering if one of you would know the 'room for manoeuvre' I have with nfs parameters to try to resolve this issue. I am, of course, at your disposal if someone need additional information (setup, cli results, etc.) Every ideas, tracks, magic potion are welcome Thank you for your precious help Best, Pierre Context OneFS version isi-tvs-1-1# isi version Isilon OneFS v6.5.5.10 B_6_5_5_138(RELEASE): 0x605050000A0008A:Mon Aug 27 16:20:22 PDT 2012    root@fastbuild-02.west.isilon.com:/build/mnt/obj.RELEASE/build/mnt/src/sys/IQ.amd64.release NFS settings Environment Computer for copying are Apple computer. Several macOS > same issue (from 10.6.8 to 10.10.2) NFS chosen for performance in comparison with samba. For some macs, we follows Isilon considerations regarding nfs, namely : nfs.conf nfs.client.mount.options=nfssvers=3,tcp,async,nolock,rw,rdirplus,rsize=65536,wsize=65536 sysctl.conf net.tcp.delayed_ack=0 net.inet.tcp.recvspace=131072 net.inet.tcp.sendspace=131072 For some others, freshly reinstalled, we don’t. Whatever theses parameters, the issue comes, randomly, most often on big files. NB : nfs copy run and finishe without error

cadiletta · Accepted Answer

The detail here that this only happens with 100GB+ file sizes is an interesting one. I can certainly imagine that this would take a long time to complete the testing with that much transfer. I assume there is a workflow that is regularly transferring files of this size that this has become an issue?

I will highly recommend working with your account team to renew the support for your Isilon hardware - you really are missing a lot of great new things without the support.

With files of that size, I wonder if there isn't a better way to transfer that would offer the reliability you seek. I certainly don't know why an NFS connection would not be reliable, but at that size, there could be further options to take advantage of. I'm thinking packaging the file as a tar ball, then transferring? Using FTP instead of NFS? I will admit this line of thinking is a bit of wild speculation - but my instinct is that with this file size, there may be something in this version of NFS transfers that reduces the stability. No information to back that up, just a gut feel that I'd have to research further, but does support what you are saying.

cadiletta · Answer

From the output it looks like you are using NFSv3 with the nolock and async mount options. This would be the first change I would test. The async write method tells the client it does not need to confirm packet reception by the server and can just continue sending packets until the file is completely transferred. Therefore if there is any dropped packets or other confusion during transfer, the resulting file would be corrupted or at least different from the original.

I would drop the async mount option entirely and then test a transfer. It should take slightly longer but reliably transfer.

crklosterman · Answer

The other thing that I would keep in mind is that today, you are running OneFS 6.5.x which is going End Of Primary Support in about 4 months. So keep in mind an upgrade needs to be on your near-term horizon. That may or may not change the behavior you're seeing here. but it is something to start planning. The current target code release is OneFS 7.1.1.2. This is not the latest GA version, but our current target.

~Chris Klosterman

Senior Solution Architect

EMC Isilon Offer & Enablement Team

email: chris.klosterman@emc.com

technique1 · Answer

Hi cadiletta,

First, Thank you for your answer.

I have in mind, of course, the two parameters you mention.

But the fact are : even on mac with no nfs.conf, or with an empty one, I have the same issue.

But, I do agree with you, i will push my test further but I am not confident regarding the first thing above.

I think to create a script on mac which copy and copy back a file like 5 times and compute md5 for comparison.

Then, I'll modify parameters (async, nolock,...) until obtaining a reliable copy.

I was just wondering if I would miss a configuration, specific to Isilon, which could explain this behavior...

Thank you for your help

Pierre

crklosterman · Answer

Pierre,

It appears that there is a KB related to this behavior :

166056 : OneFS: Mac OS X clients create a non-matching copy when copying a file to an Isilon cluster
https://support.emc.com/kb/166056

It would appear based on my review, that moving your MTU from 1500 to 9K should alleviate the issue(on the cluster, the switches, and the client). From my review of a couple of related support cases, it appears to be a problem with what data the OSX NFS client is sending while under heavy load. The packet captures seem to indicate that Isilon is taking what is sent over the wire, and writing it as requested by the client, however the client is sending some garbled information (again only intermittently under heavy load). It actually looks like it sends the same payload twice with a 2 different offsets.

For further investigation, I would suggest you open an EMC Support SR to confirm that this is the issue you are experiencing(give them the KB number above), and at the same time, an Apple support case. What version of OSX are you running? Perhaps this has been fixed with a newer OSX release.

Last: It's still in your best interests to work on upgrading your Isilon cluster to a newer release in the near term. Depending on your maintenance agreement EMC's support organization may be able to do that for you at no charge, or you can do it yourself. Even if this does not alleviate your issue, newer releases of OneFS are going to be easier to troubleshoot than 6.5.

Chris Klosterman

Senior Solution Architect

EMC Isilon Offer & Enablement Team

email: chris.klosterman@emc.com

twitter: @croaking

technique1 · Answer

Hi Chris, and others !Just a small post to tell you this thread is not neglected.I am continuing testing and try to find a correct configuration.Things are arduous&#xa0; because the described behavior is not systematic and It only happen with big files (100G min), so test are very long Well, As I said, I created a script which compute md5, copy 5 times, compute md5 of copy, copy back the 5 files and compute md5.Things are :With Isilon parameters on nfs.conf and sysctl.conf, copy are longer than with empty nfs and sysctl.Whatever the configuration, I still have md5 differences randomly.md5 differences always come when I copy from Macs to Isilon.On the old hardware mac pro, on of isilon parameters make the mac to reboot while coping (from Macs to Isilon as well)I have to investigate more to find which parameter makes mac to reboot.I haven't test the MTU tips yet because it could have effect on others computer in production.Thank you Chris for your suggestion, fortunately, I can't read KB anymore because our Isilon is no more under support.For the same reason, I can't upgrade it, (even if I would like...)We asked to Isilon France which arrangement we could find to do it without being under support, no answer.Well, Here I am, as a challenge to solve this problem in that termsThanks everyone !Pierre

dynamox · Answer

I remember 3 years ago we had an issue with large transfers, ended up being an issue with Nexus 5k NX-OS bug.  It would be interesting to test from a Redhat/CentOS box ..or maybe you can plugin your OSX box directly to the cluster by-passing any switching in-between.

Isilon

Was this post helpful?