Dell InsightIQ 4.3.0.0 User Guide

PDF

Modules in Performance Reports

Modules display pre-configured collections of information about the monitored cluster.

Table 1. Module names and descriptions
Module	Description
Active Clients	Shows the number of unique client addresses that generate protocol traffic on the monitored cluster. Clients that are connected, but not generating any traffic, are not counted. You can optionally break out this data by protocol or node. For NFS, the recommended maximum number of active clients that are supported is 1,000 per node. For SMB 2.0 and later, the recommended maximum number of active clients that are supported is 1500 per node. NOTE: Some protocols might issue a type of ping message, which can cause clients to appear active even if they are only sending these ping messages.
Average Cached Data Age	Indicates the average amount of time that data has been in the L1 and L2 caches. Shorter times indicate that data is moving faster through cache and that older data is being replaced quickly.
Average Disk Hardware Latency	Shows the average amount of time that it takes for the physical disk hardware to service an operation or transfer. You can optionally break out this data by node or disk.
Average Disk Operation Size	Shows the average size of the operations or transfers that the disks in the cluster are servicing. You can optionally break out this data by direction, node, or disk.
Average Pending Disk Operations Count	Shows the average number of operations or transfers that are in the processing queue for each disk in the cluster. You can optionally break out this data by node or disk. This module focuses on the queue depth of disk operations. Pending disk operation counts of 10 or less usually mean that the drives are operating as intended. Counts higher than 10 could indicate backlogs in disk operations, sometimes resulting in performance issues with the cluster. The graph is most meaningful when viewing the disk detail breakout.
Blocking File System Events Rate	Shows the number of file blocking events occurring in the file system per second. You can optionally break out this data by path or node.
Cache Hits	Shows the number of hits to L2 and L3 cache per second. You can optionally break out this data by job, node, or service.
Cluster Capacity	Shows the amount of storage on the cluster. Total Capacity - The total amount of storage capacity on the cluster. Allocated Capacity - The storage capacity of nodes that belong to node pools that include three or more nodes. In the OneFS command-line interface, Allocated Capacity is referred to as "size" in the output of the isi status command. If all node pools include at least three nodes, the Allocated Capacity is the same as the Total Capacity. Writable Capacity - The amount of storage capacity that user data can be written to on the cluster. User Data Including Protection - The amount of storage capacity that is in use by user data and protection.
Connected Clients	Shows the number of unique client addresses with established TCP connections to the cluster on known ports. You can optionally break out this data by protocol or node. NOTE: UDP connections do not appear as connected. Also, some short-lived TCP connections might not appear as connected, although they are active.
Contended File System Events Rate	Shows the number of file contention events, such as lock contention or read/write contention, occurring in the file system per second. You can optionally break out this data by path or node.
CPU % Use	Shows the average CPU usage for all nodes in the monitored cluster. As some nodes may consume significantly more or less CPU resources than others, the average reflects the sum of the individual CPU-usage averages for each node. NOTE: You can optionally break out this data by node. This breakout indicates the average CPU usage of each node. For example, at 10:52:22 AM on October 31, 2022, the specified node was using 14.35% of the total available node CPU capacity.
CPU Usage Rate	Shows the rate of CPU time that is consumed on all cores per second. You can optionally break out this data by job, node, or service.
Deadlocked File System Events Rate	Shows the number of file system deadlock events that the file system is processing per second. This information can be useful if you want to identify a specific file state that might be contributing to performance issues. Deadlocked events occur regularly during normal cluster operation, and the file system is designed to detect and break them. You can optionally break out this data by path or node.
Deduplication Summary (Logical)	Shows the amount of space that deduplication has saved on the cluster and the amount of data that has been deduplicated. This module refers to the logical space and data. The file metadata and protection overhead are not considered.
Deduplication Summary (Physical)	Shows the amount of space that deduplication has saved on the cluster and the amount of data that has been deduplicated. This module refers to the estimated physical space and data. The file metadata and protection overhead are considered.
Disk Activity	Shows the average percentage of time that disks in the cluster spend performing operations instead of sitting idle. You can optionally break out this data by node or disk.
Disk IOPS	Shows the rate of disk read and write operations per second. You can optionally break out this data by job, node, or service.
Disk Operations Rate	Shows the average rate at which the disks in the cluster are servicing data read/write/change requests, also referred to as operations or disk transfers. You can optionally break out this data by disk, direction, or node.
Disk Throughput Rate	Shows the total amount of data being read from and written to the disks in the cluster. You can optionally break out this data by disk, direction, or node.
External Network Errors	Shows the number of errors that are generated for the external network interfaces. You can optionally break out this data by direction, interface, or node. NOTE: During normal operations, this chart indicates an error count of 0. Errors reported in this chart are often the result of network infrastructure issues (for example, malformed frames or headers, handoff errors, or queuing errors) rather than OneFS issues. When investigating the cause of these errors, first review the logs and reports for the network switch and other network infrastructure components.
External Network Packets Rate	Shows the total number of packets that passed through the external network interfaces in the monitored cluster. You can optionally break out this data by direction, interface, or node.
External Network Throughput Rate	Shows the total amount of data that passed through the external network interfaces in the monitored cluster. You can optionally break out this data by interface, direction, client, operation class, protocol, or node. For example, if you have an application that uses a particular SmartConnect zone, you can view the throughput on those nodes to observe that application's performance.
File System Events Rate	Shows the number of file system events, or operations, such as read, write, lookup, or rename, that the file system is servicing per second. You can optionally break out this data by direction, operation class, path, node, or event.
File System Throughput Rate	Shows the rate at which data is being read from and written to the file system.
Job Workers	Shows the number of active and assigned workers on the cluster. An active worker is a worker that is performing a system job. An assigned worker is a worker that has been assigned to a system job but is not currently performing the job. You can optionally break out this data by job name or job ID.
Jobs	Shows the number of active and inactive jobs on the cluster. An active job is a system job that workers perform. An inactive job is a system job that has been assigned workers, but the workers are not currently performing the job. You can optionally break out this data by job name or job ID.
L1 and L2 Cache Prefetch Throughput Rate	Shows the amount of data that was prefetched for L1 and L2 and how much of the prefetched data was requested. Starts - Indicates the amount of data that was requested. Hits - Indicates the amount of requested data that was available.
L1 Cache Throughput Rate	Shows the amount of data that was requested from the L1 cache and how much of the requested data was available in the L1 cache. Starts - Indicates the amount of data that was requested. Hits - Indicates the amount of requested data that was available. Waits - Indicates the amount of requested data that existed in cache but was not available because the data was in use. Misses - Indicates the amount of requested data that did not exist in cache. Prefetch Hits - Indicates the amount of data that was requested from prefetch.
L2 Cache Throughput Rate	Shows the amount of data that was requested from the L2 cache and how much of the requested data was available in the L2 cache. Starts - Indicates the amount of data that was requested. Hits - Indicates the amount of requested data that was available. Waits - Indicates the amount of requested data that existed in cache, but was not available because the data was in use. Misses - Indicates the amount of requested data that did not exist in cache. Prefetch Hits - Indicates the amount of data that was requested from prefetch.
L3 Cache Throughput Rate	Shows the amount of data that was requested from the L3 cache and how much of the requested data was available in the L3 cache. Starts - Indicates the amount of data that was requested. Hits - Indicates the amount of requested data that was available. Misses - Indicates the amount of requested data that did not exist in cache.
Locked File System Events Rate	Shows the number of file lock operations occurring in the file system per second. You can optionally break out this data by path or node.
Overall Cache Hit Rate	Shows the percentage of data requests that returned hits. A hit means that the requested data was available in cache. Higher hit rates indicate that more of the requested data is being retrieved from cache. L1 Hit Rate - Indicates the hit rate for L1 cache. Private to the node servicing input-output requests. L2 Hit Rate - Indicates the hit rate for L2 cache. Global across all nodes L3 Hit Rate - Indicates the hit rate for L3 cache. Read only, data is stored on a solid state drive (SSD) instead of RAM
Overall Cache Throughput Rate	Shows the amount of data that was requested from cache and how much of the requested data was available in cache. L1 Starts - Indicates the amount of data that was requested from the L1 cache. L1 Hits - Indicates the amount of requested data that was available in the L1 cache. L2 Starts - Indicates the amount of data that was requested from the L2 cache. L2 Hits - Indicates the amount of requested data that was available in the L2 cache. L3 Starts - Indicates the amount of data that was requested from the L3 cache. L3 Hits - Indicates the amount of requested data that was available in the L3 cache.
Pending Disk Operations Latency	Shows the average amount of time that disk operations spend in the input/output scheduler. You can optionally break out this data by node or disk. The module represents the amount of time that a disk operation remains in the input-output scheduler, not the actual latency of a disk operation.
Protocol Operations Average Latency	Shows the average amount of time that is required for protocols to process incoming operations. These values are typically represented in fractions of seconds. You can optionally break out this data by client, operation class, protocol, or node. For example, view the latency by protocol, then at the operation class for each protocol, to check for specific operations that might be causing excessive latency.
Protocol Operations Rate	Shows the total number of requests per client for all file data access protocols. Combined with data from the disk throughput and disk operations rate data elements, this information can help you identify specific clients that might be contributing to cluster load. You can optionally break out this data by client, operation class, protocol, or node.
Slow Disk Access Rate	Shows the rate at which slow, or long-latency, disk operations occur. You can optionally break out this data by node or disk.
Workload CPU Time	Displays the CPU usage across workloads visible for different datasets. Filtering down with specific metrics can help determine where CPU resources are being consumed.
Workload IOPS	Displays the input/output operations across workloads visible for different datasets. Filtering down with specific metrics can help determine where disk operations are being carried out.
Workload L2/L3 Cache Hits	Displays the rate at which the cache is accessed during various operations on the monitored cluster. Filtering down with specific metrics can help determine what operations are making maximum use of the cache.
Workload Latency	Displays the average amount of time spent by disk operation. Filtering down with specific metrics can help determine where your disk operations might be slowing down.
Workload Throughput	Displays the rate at which data is being read from and written to the file system. Filtering down with specific metrics can help determine where the file transfer rate is peaking.

NOTE: Depending on which version of the OneFS operating system the monitored cluster is running, certain InsightIQ features may not be available.