PowerFlex 5.X: Latency Reported on Device
Summary: An alert is raised when a device's latency threshold is crossed.
Symptoms
Alert message
Warning
A Device on Storage Node <Node Name> (ID: <Node ID>), Device: <Device Path> has crossed the warning threshold for acceptable latency
Error
A Device on Storage Node <Node Name> (ID: <Node ID>), Device: <Device Path> has crossed the error threshold for acceptable latency
Alert thresholds
Warning - 10 seconds
Error - 20 seconds
Impact
Device latency may cause repeated failures to the DGWT, I/O errors, and a Metadata Unit (MU) failed state.
Cause
Multiple factors contribute to or cause device latency. An issue might be related to the Operating System (OS), PowerFlex code, Firmware (FW), Hardware (HW), etc.
Resolution
Validate the device state and health and, if necessary, replace it.
I/O tools can be used to assess latency, such as dd, fio, or vdbench. Also, device health can be validated using Linux utilities (sar, smartctl) or the PowerFlex SCLI.
Possible scenario:
- The device is experiencing high resource utilization due to an intensive workload
- The device has an unsupported HW type
- The device has an unsupported FW version
- The device has an actual HW issue - bad sectors, error state, etc.
Using PowerFlex SCLI to validate the device state and various information:
# scli --query_device --device_id bb5e945300050009
Device ID: bb5e945300050009 Name: DGWT_Node6--0000:e4:00.0-nvme-1
Path: /dev/disk/by-path/pci-0000:e4:00.0-nvme-1
Capacity: 3.5 TB (3576 GB)
DGWT Id: c14bba1400000005
Node Id: b8ad8a9800000005
Device Group Id: 2e16482200000000
Current Path: /dev/disk/by-path/pci-0000:e4:00.0-nvme-1
Error: No
Bandwidth:
Primary-reads 0 IOPS 0 Bytes per second
Primary-writes 0 IOPS 0 Bytes per second
Secondary-reads 0 IOPS 0 Bytes per second
Secondary-writes 0 IOPS 0 Bytes per second
Backward-rebuild-reads 0 IOPS 0 Bytes per second
Backward-rebuild-writes 0 IOPS 0 Bytes per second
Forward-rebuild-reads 0 IOPS 0 Bytes per second
Forward-rebuild-writes 0 IOPS 0 Bytes per second
Rebalance-reads 0 IOPS 0 Bytes per second
Rebalance-writes 0 IOPS 0 Bytes per second
Volume-migration-reads 0 IOPS 0 Bytes per second
Volume-migration-writes 0 IOPS 0 Bytes per second
Enter-protected-maintenance-mode-reads 0 IOPS 0 Bytes per second
Enter-protected-maintenance-mode-writes 0 IOPS 0 Bytes per second
Exit-protected-maintenance-mode-reads 0 IOPS 0 Bytes per second
Exit-protected-maintenance-mode-writes 0 IOPS 0 Bytes per second
State: Normal
Device HW checks enabled: TRUE
Physical Device Information:
Device Type: UNKNOWN
Media Type: SSD
Vendor Name: N/A
Model Name: N/A
Serial Number: N/A
Slot Number: N/A
Firmware Version: N/A
Cache Look-ahead: not Active
Write Cache: not Active
ATA Security: not Active
Logical Sector Size: 0 B
Physical Sector Size: 0 B
Capacity: 0 GB
LED Setting: OFF
SMART Information:
Aggregated State: NEVER_FAILED
Temperature State: NEVER_FAILED
Current Value: 0 Worst Value: 0 Threshold: 0
Media Wearout Indicator State: NEVER_FAILED
Current Value: 0 Worst Value: 0 Threshold: 0
RAID Controller Information:
Serial Number: N/A
RAID vDisk status: N/A
RAID vDisk Type: N/A
RAID vDisk Cache: N/A
Using smartctl to validate the device state and various information:
sudo smartctl -i /dev/<device>
For HW and FW details, look for the following key attributes in the output:
- Product - The HW identifier of the drive
- Vendor - Indicates the hardware vendor. Some devices show this explicitly; others only through Product naming.
- Revision - The version of the FW running on the drive
- Device Type - Indicates if the device is an HDD, SSD, or NVMe.
For bad sectors, look for the following key attributes in the output:
- Reallocated_Sector_Ct - Indicates the number of bad sectors that have been replaced with spare sectors.
- A nonzero value suggests the drive has encountered bad sectors.
- Current_Pending_Sector - Sectors that are waiting to be retested or reallocated.
- A nonzero value suggests the drive may still have unreadable sectors.
- Offline_Uncorrectable - Number of sectors that could not be corrected during offline scanning.
- A high value here is a red flag.
For read-only or failed state, look for the following mentions in the output:
- Read-only mode: Enabled
- SMART overall-health self-assessment test result: FAILED → may indicate forced read-only
Impacted Versions
PowerFlex 5.x