Isilon: Check your cluster for bottlenecks

Summary: How to identify bottlenecks on your cluster.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

This article is divided into several sections:

1.) CPU Utilization
2.) FileSystem Checks

Check your cluster for CPU overutilization.
Step1:
Determine the number of available cores per Node:
 
isi_for_array isi_hw_status | grep -i proc
Isilon8005-1:    PROC: Single-proc, -core
Isilon8005-1#

Note down the number of processors per Node and the CPU type (quad, hexa, octa) and multipy it with the number of processors.
That is the upper margin the CPU can handle.

Step 2:
Check the current utilization:
 
isi_for_array echo `hostname`;uptime | grep -i load | awk '{print "Load averages last 15 minutes: " $8"\nLoad averages last 10 minutes: "$9"\nLoad averages last 1 minutes: "$10}'
Isilon8005-1: Isilon8005-1
Load averages last 15 minutes: 1.29,
Load averages last 10 minutes: 1.58,
Load averages last 1 minutes: 1.57
Isilon8005-1#

This shows the utilization on a per-node basis over the given timespan, the individual numbers should not be higher than the core count (see above)

If the cluster is running 8.0.0.3 or above, the following statistics command can be used:
isi statistics query current list --nodes=all --keys=node.load.15min,node.load.5min,node.load.1min,node.cpu.count
 Node  node.load.15min  node.load.5min  node.load.1min  node.cpu.count
----------------------------------------------------------------------
    1              159             165             173               2
----------------------------------------------------------------------
Total: 1

This output shows the current usage (1/100ths) relative to the cpu (core) count.

Ideally, the returned values should be less than the available cores.

Filesystem Utilization
This section describes a top-level file system check with a focus on hard drives:

Step 1   
isi statistics drive list --long --type={SAS|SSD|SATA}
 Timestamp  Drive  Type  OpsIn  BytesIn  SizeIn  OpsOut  BytesOut  SizeOut  TimeAvg  Slow  TimeInQ  Queued  Busy  Used  Inodes
------------------------------------------------------------------------------------------------------------------------------
1529750006    1:2   SAS    2.2    15.0k    6.8k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.1% 10.0%    3.4k
1529750006    1:3   SAS    2.4    19.7k    8.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.1%  9.7%    3.3k
1529750006    1:4   SAS    1.6    14.7k    9.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 12.3%    3.3k
1529750006    1:5   SAS    1.0     8.2k    8.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 14.7%    3.3k
1529750006    1:6   SAS    2.0    16.4k    8.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 11.3%    3.3k
1529750006    1:7   SAS    1.2    27.9k   23.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 11.8%    3.3k
------------------------------------------------------------------------------------------------------------------------------
This command shows a representation of the system and the selected disk type (SAS, SATA, or SSD) with the individual statistics. To trim this down further, the following columns are of importance:
  • Used
  • Busy
  • Queued
  • TimeInQ
The output can be further refined by appending the "sort" parameter:
 
isi statistics drive list --long --type=SAS --sort=busy --nodes=all
 Timestamp  Drive  Type  OpsIn  BytesIn  SizeIn  OpsOut  BytesOut  SizeOut  TimeAvg  Slow  TimeInQ  Queued  Busy  Used  Inodes
------------------------------------------------------------------------------------------------------------------------------
1529750499    1:2   SAS    1.4     8.4k    6.0k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 10.0%    3.4k
1529750499    1:3   SAS    1.0     5.1k    5.1k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0%  9.7%    3.3k
1529750499    1:4   SAS    2.0    13.3k    6.7k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 12.4%    3.3k
1529750499    1:5   SAS    1.6    10.0k    6.3k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 14.7%    3.3k
1529750499    1:6   SAS    1.0     6.7k    6.7k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 11.3%    3.3k
1529750499    1:7   SAS    1.4    19.9k   14.2k     0.0       0.0      0.0      0.0   0.0      0.0     0.0  0.0% 11.8%    3.3k
------------------------------------------------------------------------------------------------------------------------------

Available sort options :
access_latency      
bytes_in           
iosched_latency   
type                
xfer_size_in        
xfers_out
access_slow        
bytes_out          
iosched_queue      
used_bytes_percent  
xfer_size_out
busy                
drive_id            
time                
used_inodes        
xfers_in

Note: The disk busy rate should not constantly exceed 90%
  General guidelines:
  • The TimeInQ and Queued should not be constantly over 10 SATA. SAS and SSD have a higher performance
  • The used column should not exceed 90 %
This can be used on a per-node or cluster-wide selection.

Step 3:
Evaluate cluster-wide IOPs to ascertain if the solution is being overused.

The following guidelines are commonly accepted for SATA drives:
  • Random I/O workload: 60 iops per drive
  • Mixed I/O workload: 80 iops per drive
  • Large I/O workload: 100 iops per drive

The first requirement is to get the number of drives in the cluster:
isi statistics drive list --type=SAS --nodes=all | grep Total
Total: 6

Multiply the number of drives with the workload to get the upper margin for the overall cluster iops.

The current cluster iops can be taken with the following command:
isi statistics pstat list --interval=2 --repeat=10 | grep iops
MB/s             0.00        MB/s             0.00        Disk        1.80 iops
MB/s             0.00        MB/s             0.00        Disk        1.80 iops
MB/s             0.00        MB/s             0.00        Disk        1.80 iops
MB/s             0.00        MB/s             0.00        Disk       58.80 iops
MB/s             0.00        MB/s             0.00        Disk       58.80 iops
MB/s             0.00        MB/s             0.00        Disk       58.80 iops
MB/s             0.00        MB/s             0.00        Disk       25.00 iops
MB/s             0.00        MB/s             0.00        Disk       25.00 iops
MB/s             0.00        MB/s             0.00        Disk       25.00 iops
MB/s             0.00        MB/s             0.00        Disk        0.00 iops

with --interval and --repeat a moderate timescale can be set, which allows for a more precise look at the system.

If the cluster wide iops are considerably above the guidelines, the cluster is being overused.
 

Additional Information

If performance problems are being observed and the total cluster health is seemingly ok, contact Dell Support for assistance with further investigation.

The above steps are a top-level procedure only to provide a guideline only.

Affected Products

Isilon

Products

Isilon
Article Properties
Article Number: 000008978
Article Type: How To
Last Modified: 22 May 2025
Version:  7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.