Peter_Sero
4 Beryllium

Re: Isilon storage performance issue

Hi Damal


this is roughly how I use these commands to  investigate

performance issues in new situations:


> isi statistics pstat


Bottom half: Overall view to see what's going on.

Compare Network -- Filesystem -- Disk  throughputs

to see wether they are consistent with what is expected

for the workflow as far as it is known. In a typical NAS workflow,

network throughput should match filesystem througput.

Write throughput should match disk write throughput (mind the protection overhead).

Read throughput matches disk read throughput for uncached workflows,

or can be much higher which good caching.

High disk activity without network or filesystem activity indicates

some internal job is running (restripes etc).

High CPU without any network/filesystem/disk

activity would be very strange, for example

(some process running wild, etc)

Just illustrating how things can be learned from pstat.

Top half of pstat is protocol specific,

might need to run separately for NFS and SMB.

Again, quick check for consistency:

Does the observed mix of reads/writes/ etc. make sense

in the light of the assumed workflow?

> isi statistics system --nodes --top

Quick check wether the load is well balanced across the cluster,

and how the protocols are used (SMB vs NFS etc).

Can also indicate where physical network bandwidth hits the max.

> isi statistics client --orderby=Ops --top

Who is causing the load?

--orderby=In or Out or TimeAvg for throughputs or latencies resp.

> isi statistics client --orderby=Ops --top --long

With --long there are InAvg and OutAvg, which denote the request ("block") sizes

(NOT the average of In and Out rates!!).  Small request sizes

often indicate suboptimal configs on the clients side.

> isi statistics drive -nall -t --long --orderby=OpsOut

Do the disk activities max out? Also --orderby=OpsIn or TimeInQ

Do the disk activities match the assumed workload:

Small SizeIn and SizeOut request size indicate metadata ops

or small random IO ops; pure streaming reads/write are usually upto 64k.

Disclaimer: Those are just by personal favorites (plus isi statistics heat),

and I might err on interpretations; I am not aiming at convincing anyone

Cheers

-- Peter