PowerFlex ESXi Dell PERC H730 or H730P RAID Controller Issue

Summary: PowerFlex disk devices are failing randomly or report errors on disk devices. ESXi reports errors on disk devices, and LSI raid controller errors (SCSI abort and reset commands).

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Scenario:
Only ESXI 5.5 or 6.0 are affected.
ScaleIO system is reporting disk device errors, and disk devices are failing randomly. Once a disk device is back "online," the system continues working as expected until the next disk device failure.

Data unavailability might occur when disk devices fail on different SDSs.

Symptoms: 
ScaleIO system events report data degraded:

2016-01-22 17:28:35.213 MDM_DATA_DEGRADED ERROR The system is now in DEGRADED state.

ScaleIO system events report disk device errors or failure:

799 2016-01-22 17:28:39.818 SDS_DEV_ERROR_REPORT ERROR Device error reported on SDS: 10.3.1.21, Device: /dev/sdb. State: NORMAL upDownState: UP processState: DEV_ERR_INPROGRESS devErrState: REPORT

VMkernel logs example:

2016-01-2613:48:09.254 2016-01-26T13:50:35.576Z esxi-vcd-compute-03 vmkernel: cpu11:301656)VSCSI: 2590: handle 8194(vscsi0:0):Reset request on FSS handle 279624493 (0 outstanding commands) from (vmm0:scsi-test-4)
2016-01-2613:48:09.254 2016-01-26T13:50:35.576Z esxi-vcd-compute-03 vmkernel: cpu11:32946)VSCSI: 2868: handle 8194(vscsi0:0):Reset [Retries: 0/0]from (vmm0:scsi-test-4)
2016-01-2613:48:09.255 2016-01-26T13:50:35.576Z esxi-vcd-compute-03 vmkernel: cpu11:32946)lsi_mr3: mfi_TaskMgmt:254: Processing taskMgmt virt reset for device: vmhba0:C2:T0:L0
2016-01-2613:48:09.255 2016-01-26T13:50:35.576Z esxi-vcd-compute-03 vmkernel: cpu11:32946)lsi_mr3: mfi_TaskMgmt:258: VIRT_RESET cmd # 2514475
2016-01-2613:48:09.255 2016-01-26T13:50:35.576Z esxi-vcd-compute-03 vmkernel: cpu11:32946)lsi_mr3: mfi_TaskMgmt:262: ABORT

VMkernel logs example:

2015-09-04T14:05:58.860Z cpu20:32859)lsi_mr3: mfi_TaskMgmt:254: Processing taskMgmt virt reset for device: vmhba0:C2:T0:L0
2015-09-04T14:05:58.860Z cpu20:32859)lsi_mr3: mfi_TaskMgmt:258: VIRT_RESET cmd # 28267605
2015-09-04T14:05:58.860Z cpu20:32859)lsi_mr3: mfi_TaskMgmt:262: ABORT
2015-09-04T14:05:58.864Z cpu8:33188)lsi_mr3: fusionWaitForOutstanding:2516: megasas: [ 0]waiting for 1 commands to complete

SCSI abort commands appear in the messages file on SVM:

Sep 2 21:35:09 ScaleIO-10-100-7-66 kernel: [28986.521362] sd 5:0:15:0: [sdw] task abort on host 5, ffff880095a00d80
Sep 2 21:35:09 ScaleIO-10-100-7-66 kernel: [28986.782459] sd 4:0:4:0: [sdg] task abort on host 4, ffff8800959d65c0
Sep 2 21:35:09 ScaleIO-10-100-7-66 kernel: [28986.782466] sd 4:0:4:0: [sdg] Failed to abort cmd ffff8800959d65c0

Impact:

  • Disk devices are failing randomly, causing rebuild and re-balance and impacting system performance.
  • Data unavailable may occur. 
  • When there is an issue with the disk device and the disk device is not responding for any reason, the SVM/OS offlines the disk device.

Cause

Driver or Firmware issue
VMware is aware of a known issue with Dell PERC H730 controllers.

Resolution

Workaround:
Dell recommended upgrading to the latest firmware or drivers (at the time of the issue).

To "online" the failed disk device, use the following article PowerFlex Unable to clear SDS device errors - device state is offline on OS level

Affected Products

PowerFlex rack, ScaleIO
Article Properties
Article Number: 000283423
Article Type: Solution
Last Modified: 06 Mar 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.