Peter_Sero
4 Beryllium

Re: Isilon storage performance issue

A couple of thoughts and suggestions:

http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf

is really worth reading to learn more about the filesystem layout. And questions similar to yours

have been discussed here recently

Isilon overhead for 174kb files

How many sizes does Isilon consume when writing 128kb file with N+2:1 and N+3:1

With all information, it is really fun to examine a file's disk/block layout as reported by isi get -DD "file".

Furthermore:

> 5.If I change filesystem block size from 8K to 32K , does it means my stripe unit will be now of 16X32K

I don't think you can do so - which exact setting are you referring to?

> 6.OneFS uses 16 contiguous block to create one stripe unit , can we change 16 to some other value?

Couldn't  imagine, but the access pattern parameter controls whether a larger or lower number of disks per node are being used (under the constraint of the chosen protection level).

> 7.Can we access Isilon storage cluster from compute node (install RHEL) using SMB protocol, as I read in performance benchmark from storage council that SMB performance is almost double compare to NFS in terms of IOPS?

In benchmarks SMB IOPS appear higher than NFS IOPS because the set of protocol operations is different, even  for identical workloads, not to mention different workloads used. You cannot compare the resulting values...

For your original test, you might max out with the disk IOPS (xfers),  but you could also get stuck at a certain rate of your "application's IOPS " while seeing few or no disk activity at all(!) -- because your data is mostly or entirely in the OneFS cache . Check the "disk IOPS" or xfers, including ave size per xfer, with

isi statistics drive -nall -t --long --orderby=OpsOut

and cache hit rates for data (level 1 & 2) with:

isi_cache_stats -v 2

In case of very effective caching the IOPS will NOT be limited by disk transfers (so all that filesystem block size reasoning doesn't apply).

Instead the limit is imposed by CPU usage, or network bandwidth, or by protocol (network + execution) latency even

if CPU or bandwidth < 100%.

In the latter case, doing more requests in parallel should be possible (it seems you are right on that track anyway with multiple jobs).

To check protocol latencies, use "isi statistics client" as before and add --long:

isi statistics client --orderby=Ops --top --long

This will show latency times as:  TimeMax    TimeMin    TimeAvg   (also useful for --orderby=... !)

Good luck!

Peter