1 Nickel

Re: Ask the Expert: Isilon Performance Analysis

I will answer the second part of your question first. Smartcache is enabled by default for concurrent (default) and streaming optimized directories and files. Random optimization by default implicitly disables write cache coalescing.

As you perform buffered write operations from smb, nfs or protocol writes. These writes will be coalesced in a write buffer e.g. If there are a sequence of proto.nfs write operations @32KB at next offsets to each other we will buffer this upto 1MB or 2MB (concurrent / stream optimized). This buffer is cluster wide and cache coherent between all nodes. At some point we will write out the data using both data protection and disk space utilization as guides to which nodes are written too. The journaled writes leverage our nvram as part of the optimized write to disk storage.

Answering the first part of your question. A means to measure your buffered writes is simply to measure the latency seen for protocol write operations

Isi statistics protocol --class=write --orderby=timeavg --top

In 7x onefs you should note very optimal writes in the microsecond range. When this climbs to the millisecond range, the two simple reasons would be

1) the journal cannot flush writes to disk based on rate of change. This is another way of saying that there are insufficient disks in the node pool to satisfy the demand.

Isi statistics drive -nall --orderby=timeinq --long --top

You might note that the sum of Opsin (writes) + opsout (read) exceeds a normal range for disk type. You would see > 1 queued io . The more queued the more significant it would be to look to increasing spindle count. Adding nodes almost immediately brings new disks into the fold.

2) the write work flow is not buffering. Meaning that its setting a directio flag. In this case nvram is still leveraged however, the I/o are no longer coalesced into more optimal operations.

A paper that talks to smartpools and node pools

There are sysctls that do monitor write cache but they are non-trivial to translate. This is why I suggested a more simplified approach. I.e you will more likely be spindle challenged than you will be nvram/write cache challenged.

0 Kudos