Isilon: Check your cluster for bottlenecks
Summary: How to identify bottlenecks on your cluster.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
This article is divided into several sections:
1.) CPU Utilization
2.) FileSystem Checks
Check your cluster for CPU overutilization.
Step1:
Determine the number of available cores per Node:
Step 2:
Check the current utilization:
If the cluster is running 8.0.0.3 or above, the following statistics command can be used:
Filesystem Utilization
This section describes a top-level file system check with a focus on hard drives:
Step 1
Step 3:
Evaluate cluster-wide IOPs to ascertain if the solution is being overused.
The following guidelines are commonly accepted for SATA drives:
The first requirement is to get the number of drives in the cluster:
The current cluster iops can be taken with the following command:
1.) CPU Utilization
2.) FileSystem Checks
Check your cluster for CPU overutilization.
Step1:
Determine the number of available cores per Node:
isi_for_array isi_hw_status | grep -i proc Isilon8005-1: PROC: Single-proc, -core Isilon8005-1# Note down the number of processors per Node and the CPU type (quad, hexa, octa) and multipy it with the number of processors. That is the upper margin the CPU can handle.
Step 2:
Check the current utilization:
isi_for_array echo `hostname`;uptime | grep -i load | awk '{print "Load averages last 15 minutes: " $8"\nLoad averages last 10 minutes: "$9"\nLoad averages last 1 minutes: "$10}'
Isilon8005-1: Isilon8005-1
Load averages last 15 minutes: 1.29,
Load averages last 10 minutes: 1.58,
Load averages last 1 minutes: 1.57
Isilon8005-1#
This shows the utilization on a per-node basis over the given timespan, the individual numbers should not be higher than the core count (see above)
If the cluster is running 8.0.0.3 or above, the following statistics command can be used:
isi statistics query current list --nodes=all --keys=node.load.15min,node.load.5min,node.load.1min,node.cpu.count
Node node.load.15min node.load.5min node.load.1min node.cpu.count
----------------------------------------------------------------------
1 159 165 173 2
----------------------------------------------------------------------
Total: 1
This output shows the current usage (1/100ths) relative to the cpu (core) count.
Ideally, the returned values should be less than the available cores.
Filesystem Utilization
This section describes a top-level file system check with a focus on hard drives:
Step 1
isi statistics drive list --long --type={SAS|SSD|SATA}
Timestamp Drive Type OpsIn BytesIn SizeIn OpsOut BytesOut SizeOut TimeAvg Slow TimeInQ Queued Busy Used Inodes
------------------------------------------------------------------------------------------------------------------------------
1529750006 1:2 SAS 2.2 15.0k 6.8k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1% 10.0% 3.4k
1529750006 1:3 SAS 2.4 19.7k 8.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1% 9.7% 3.3k
1529750006 1:4 SAS 1.6 14.7k 9.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 12.3% 3.3k
1529750006 1:5 SAS 1.0 8.2k 8.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 14.7% 3.3k
1529750006 1:6 SAS 2.0 16.4k 8.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 11.3% 3.3k
1529750006 1:7 SAS 1.2 27.9k 23.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 11.8% 3.3k
------------------------------------------------------------------------------------------------------------------------------
This command shows a representation of the system and the selected disk type (SAS, SATA, or SSD) with the individual statistics. To trim this down further, the following columns are of importance:
- Used
- Busy
- Queued
- TimeInQ
isi statistics drive list --long --type=SAS --sort=busy --nodes=all Timestamp Drive Type OpsIn BytesIn SizeIn OpsOut BytesOut SizeOut TimeAvg Slow TimeInQ Queued Busy Used Inodes ------------------------------------------------------------------------------------------------------------------------------ 1529750499 1:2 SAS 1.4 8.4k 6.0k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 10.0% 3.4k 1529750499 1:3 SAS 1.0 5.1k 5.1k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 9.7% 3.3k 1529750499 1:4 SAS 2.0 13.3k 6.7k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 12.4% 3.3k 1529750499 1:5 SAS 1.6 10.0k 6.3k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 14.7% 3.3k 1529750499 1:6 SAS 1.0 6.7k 6.7k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 11.3% 3.3k 1529750499 1:7 SAS 1.4 19.9k 14.2k 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0% 11.8% 3.3k ------------------------------------------------------------------------------------------------------------------------------ Available sort options : access_latency bytes_in iosched_latency type xfer_size_in xfers_out access_slow bytes_out iosched_queue used_bytes_percent xfer_size_out busy drive_id time used_inodes xfers_in Note: The disk busy rate should not constantly exceed 90%General guidelines:
- The TimeInQ and Queued should not be constantly over 10 SATA. SAS and SSD have a higher performance
- The used column should not exceed 90 %
Step 3:
Evaluate cluster-wide IOPs to ascertain if the solution is being overused.
The following guidelines are commonly accepted for SATA drives:
- Random I/O workload: 60 iops per drive
- Mixed I/O workload: 80 iops per drive
- Large I/O workload: 100 iops per drive
The first requirement is to get the number of drives in the cluster:
isi statistics drive list --type=SAS --nodes=all | grep Total Total: 6 Multiply the number of drives with the workload to get the upper margin for the overall cluster iops.
The current cluster iops can be taken with the following command:
isi statistics pstat list --interval=2 --repeat=10 | grep iops MB/s 0.00 MB/s 0.00 Disk 1.80 iops MB/s 0.00 MB/s 0.00 Disk 1.80 iops MB/s 0.00 MB/s 0.00 Disk 1.80 iops MB/s 0.00 MB/s 0.00 Disk 58.80 iops MB/s 0.00 MB/s 0.00 Disk 58.80 iops MB/s 0.00 MB/s 0.00 Disk 58.80 iops MB/s 0.00 MB/s 0.00 Disk 25.00 iops MB/s 0.00 MB/s 0.00 Disk 25.00 iops MB/s 0.00 MB/s 0.00 Disk 25.00 iops MB/s 0.00 MB/s 0.00 Disk 0.00 iops with --interval and --repeat a moderate timescale can be set, which allows for a more precise look at the system. If the cluster wide iops are considerably above the guidelines, the cluster is being overused.
Additional Information
If performance problems are being observed and the total cluster health is seemingly ok, contact Dell Support for assistance with further investigation.
The above steps are a top-level procedure only to provide a guideline only.
The above steps are a top-level procedure only to provide a guideline only.
Affected Products
IsilonProducts
IsilonArticle Properties
Article Number: 000008978
Article Type: How To
Last Modified: 22 May 2025
Version: 7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.