Tsky

32 Posts

5073

April 13th, 2015 08:00

What is the Diffrence between Native NFS v.s NFS on OneFS ?

Hi All,

I've started to investigate an issue that is very strange , and until now no one had an solution for this.

Maybe some of the NFS Gurus can help here.

Client : CentOS 6.6 , 64Bit (Everything works NFSv3)

NFS Native Server Export : Export comes from a Linux Server with 2 x local drives SAS15K on it .

NFS OneFS Server Export : Export comes from a OneFS isilon ,S-200 with data on SSD.

Mount Exports are same on both exports from the client side.

Disabled Sync writes on Isilon.

The Scenario:

Copy From Client to Native NFS Server - 0m35s
Copy From Client to Isilon NFS Server - 1m55s

Is there any explanation of Why NFS Protocol will act this way on Isilon?

Remarks: TcpDumps show that when doing the same action via Isilon it requires Double amounts of Request/Ansers
Between Client <-> Isilon

No diffrence via MTU/Not diffrence of Network .... Same scenario diffrent target

Responses(21)

kipcranford

125 Posts

0

April 13th, 2015 08:00

> Mount Exports are same on both exports from the client side.

Can you please post the mount string/options here?

> Disabled Sync writes on Isilon.

Please describe how you did this in OneFS.

> Copy From Client to Isilon NFS Server - 1m55s

Please post here the way you did the copy (e.g. "cp", "dd", etc), as well as the size of the file you used.

> TcpDumps show that when doing the same action via Isilon it requires Double amounts of Request/Ansers

Maybe a block size difference. Are the request sizes the same between the two copies?

Tsky

32 Posts

0

April 13th, 2015 09:00

Mount Exports:

kipcranford wrote:

> Mount Exports are same on both exports from the client side.

Can you please post the mount string/options here?

> Disabled Sync writes on Isilon.

Please describe how you did this in OneFS.

> Copy From Client to Isilon NFS Server - 1m55s

Please post here the way you did the copy (e.g. "cp", "dd", etc), as well as the size of the file you used.

> TcpDumps show that when doing the same action via Isilon it requires Double amounts of Request/Ansers

Maybe a block size difference. Are the request sizes the same between the two copies?

rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.1.30.7,mountvers=3,mountport=300,mountproto=tcp,local_lock=none,addr=10.1.30.7 0 0

Disabled On Isilon:

isi nfs settings export modify --setattr-asynchronous=yes

How Copy was made:

Untar of Linux.tar file , 700MB , 50K files objects

TcpDump :

The block size are the same.

AU

Anonymous User

170 Posts

2

April 13th, 2015 10:00

You haven't discovered anything strange, nor have you discovered anything unexpected.

The Isilon OneFS architecture has tradeoffs, as does every other platform. One of the tradeoffs of a scale-out architecture is that creates and deletes will be slow. Other operations will be fast. The tradeoff you've made in your test is that you've thrown most of the redundancy out the window and provided no way for your environment to scale. What happens when you have a lot of clients hitting your client? Will it scale linearly? Until what point?

In our environment, what matters the most is wall-clock time of the entire application spread across hundreds of servers. This is a mixture of a large number of metadata operations and many reads and writes. The Isilon wins. I've measured 1,000,000 NFS operations per second load-balanced across 28 S200 nodes. I hit 200-300K NFS operations per second just about all day, every day. This is getting mission critical production workloads done. I regularly provide 80Gbps out of my Isilon cluster to my clients and it rarely hits below 20Gbps (24x7).

Taking a single tarball and extracting it and saying you've "won" the benchmark is sadly misguided. If that's the operation you do every day, and that's *all* you do, then yes, the Isilon will probably lose and you've bought the wrong platform. On the other hand, if you actually want to do something with that tarball, and provide redundancy if your single Linux system fails, and scale as you grow, then you may have a competition on your hands.

In my Isilon environment, I can lose 2 ENTIRE Isilon nodes - and all of their combined 48 spindles and SSDs - and not lose access to data. The users may see a small pause as the IP addresses fail over, but the data is still there, along with access to that data. In your case, you can have a single raid controller or chunk of memory go bad the wrong way and potentially lose the data. If you need to upgrade the kernel, you'll lose access to the data. If you need to restart your NFS server for any reason, you'll lose access to the data. I can lose multiple drives and not lose access to the data, nor run out of redundancy while those drives are being repaired. Imagine your single server having a drive fail. Are you 100% sure you can rebuild a failed drive in a timely manner before the 2nd drive fails? Are you willing to bet your job on it? Does your little Linux server provide snapshots for data recovery? Off-node replication at the block level? Without significantly impacting performance?

We had a DDN storage array here for our workloads a couple of years ago. It's designed for high performance computing environments in supercomputer centers. The Isilon kicked its butt out the door.

kipcranford

125 Posts

1

April 13th, 2015 12:00

> When stated that there is no difference in the packetsize i would deepdive into the trace and take a look on Fails or something. There has to be an explanation for these.

Indeed. Using my git repository, I took packet captures when cloning to a Linux NFS server and to an Isilon NFS server. The number of NFS packets was 30404 and 30406, respectively. Both NFS servers were pretty close to "out of the box" in terms of configurations.

So something does seem a bit strange...

Tsky

32 Posts

0

April 13th, 2015 12:00

Hi,

Its much appreciated the informative response and of course i agree with you on all aspects regading Scale,Performance etc.

The Question i ask is a very narrow scenario therfore i'm discarding any other "REAL" advantages and focusing

on this "Scenario" .

I just wanted to take the advanteages you mentioned out of the equation.

Tsky

32 Posts

0

April 13th, 2015 12:00

Sluetze,

Thanks and after 12 Hours of DeepDive and no retransmisions/Failure its a Mistory

sluetze

300 Posts

0

April 13th, 2015 12:00

ed,

what you describe would expain higher latency / servertime needed to execute the commands provided by the client. And of course you are right at this point.

But for me there is no explanation about the

Since he has a double amount of requests / answers

part.

When stated that there is no difference in the packetsize i would deepdive into the trace and take a look on Fails or something. There has to be an explanation for these.

But I do not have a lot of experience with NFS so im out

Best Regards

sluetze

Tsky

32 Posts

0

April 13th, 2015 13:00

I'll try to check if the TCP Capture is diffrent again altough i'm now counting on this to be the answer of the

global behavior .

I'll try to check it again and update regarding it ,

other than that thank you for the answers you provided me .

Peter_Sero

1.2K Posts

0

April 14th, 2015 09:00

Much simpler: Try unpacking the test tar file locally on the Linux server and on the Isilon.

The difference comes mostly from the file system rather than from the NFS protocol or other network parameters...

Linux filesystems create files seemingly at blazing speed BUT they lie to the user/applications -- "thanks" to write caching in RAM(!) Linux reports successful file creations and data writes back to the applications LONG BEFORE the files will be safely written to disk...

Isilon caches metadata and data writes in non-volatile memory (NVRAM) on multiple nodes, well protected and still pretty fast, but can't beat local RAM.

-- Peter

kipcranford

125 Posts

0

April 14th, 2015 09:00

> Isilon caches metadata and data writes in non-volatile memory (NVRAM) on multiple nodes, well protected and still pretty fast, but can't beat local RAM.

While this is true, OneFS will still use its write cache too, depending on the filesystem configuration (the default is to use it). So writes are actually coalesced in volatile RAM before they are flushed to the journal, which is non-volatile and considered stable storage as far as OneFS is concerned.

This is also true when NFS is being used. Client writes by default will use the OneFS write cache, unless disabled on the Isilon, via a 'sync' mount option, or via an O_DIRECT or O_SYNC open flag in the client application.

kipcranford

125 Posts

2

April 14th, 2015 09:00

> I'll try to check if the TCP Capture is diffrent again altough i'm now counting on this to be the answer of the

global behavior

I think it's been established that for a single-threaded workload like 'tar' or 'git' that consists of lots of metadata reads and writes and small file creates, writes, and reads, OneFS will not have the raw performance of a Linux NFS server or even other types of headed NAS architectures.

However, that doesn't mean that the performance difference you're seeing, and the "twice as many" NFS packets, in this particular instance is correct. In other words, OneFS being over 2x slower than the Linux server might be due to other issues, not simply because OneFS has a bit more per-operation latency on things like creates and small file I/O. I find it very coincidental, for example, that your test timings are about 2x different between Linux and OneFS, *and* you're seeing 2x the NFS traffic as well when testing against the Isilon.

My recommendation is to open a Support case on this, if you haven't already, to at least figure out why the NFS conversation with OneFS is so much more verbose than with Linux...

Peter_Sero

1.2K Posts

0

April 14th, 2015 10:00

That's the point, tar operates sequentially and before proceeding to the next file it waits for the current file to be closed, which implies a stable acknowledgement (even when O_DIRECT or O_SYNC are not set).

So one has the endurant cache latency (plus network latency in case of NFS) per single file, which can hurt a bit when many very small files are going to be created...

-- Peter

kipcranford

125 Posts

0

April 14th, 2015 10:00

At least in the 'git clone' test, writes are unstable, though each write is followed immediately by a COMMIT (implying an fsync() I would guess). Because the write is unstable, Endurant Cache isn't used, and unless something else was changed, the OneFS write cache *is* being used (and being flushed constantly). As you say, here's where the small latencies in OneFS add up.

Again, I'm not arguing performance (though I still think this particular issues needs some more investigation), I was commenting on the general fact that typically OneFS *will* use a volatile RAM cache for writes, unless directed not to either within OneFS itself or via the client stack...

Tsky

32 Posts

0

April 14th, 2015 13:00

Thank you very much guys ,

Kip of course there is already an ongoing SR which they dropped with "This is a typical behavior" and that

they don't see any performance issues form Storage Isilon Side.

After i've read your PDF and also our talks it have it a much more depper insight on the fact that maybe

Isilon isn't the correct Platform for this.

If you want i can provide you privatly the SR if you want to overview it.

Peter_Sero

1.2K Posts

0

April 15th, 2015 03:00

Kip, of course this is all fine as long as OneFS acknowledges writes/commits as unstable not stable to the client while the data is only in volatile RAM. Which certainly is what OneFS does... oh wait, perhaps unless we also use --commit-asynchronous True in addtion to --setattr-asynchronous True (and setting both to True actually speeds up the test considerably, at the cost of loosing blocks on crashes).

Thanks for the great discussion!

-- Peter

1
2

View All

No Events found!