Symptoms
Dell PowerEdge servers that are equipped with add-in PCIe network adapters may report an unknown health status within the iDRAC9 management interfaces on iDRAC9 5.xx.xx.xx firmware versions. When this condition occurs, the Lifecycle Log records an HWC8607 error alert.
Impacted iDRAC9 Firmware Versions:
- 5.00.00.00
- 5.00.10.00
- 5.00.10.20
- 5.00.20.00
- 5.10.00.00
- 5.10.10.00
Within the iDRAC9 user interface (UI), the status of the impacted network adapter is represented with a question mark '?' when unknown health status is encountered.
Example:
iDRAC9 UI > System > Network Devices
Lifecycle Log Error Example:
2022-01-11 23:25:16 3223 HWC8607 The data communication with the device NIC in Slot 2 running on the port 4 is lost.
2022-01-11 23:25:04 3222 HWC8607 The data communication with the device NIC in Slot 2 running on the port 2 is lost.
2022-01-11 23:24:46 3219 HWC8607 The data communication with the device NIC in Slot 2 running on the port 3 is lost.
2022-01-11 23:20:06 3216 HWC8607 The data communication with the device NIC in Slot 2 running on the port 1 is lost.
Cause
iDRAC9 5.00.00.00 firmware introduced support for PCIeVDM side-band management support for Dell PCIe network adapters. Under certain conditions, the PCIeVDM queues on the iDRAC9 may fill and prevent iDRAC from sending any additional commands to adapter.
Resolution
iDRAC9 firmware version 5.10.30.00 (June 2022) corrects the conditions that leads to this sighting.
Workarounds:
Disabling PCIeVDM on the iDRAC9 reverts the management of the installed PCIe network adapters back to SMBUS interface without any impact to network device management. Disabling PCIeVDM and rebooting iDRAC recovers from this condition and prevents additional occurrences.
To disable PCIeVDM on the iDRAC9, leverage the following RACADM commands:
racadm>>racadm set iDRAC.PCIeVDM.Enable Disabled
[Key=iDRAC.Embedded.1#PCIeVDM.1]
Object value modified successfully
racadm>>racadm racreset
RAC reset operation initiated successfully. It may take a few
minutes for the RAC to come online again.
Note: Disabling PCIeVDM on the iDRAC9 controller disables the rebootless NVMe firmware feature that is introduced in iDRAC9 5.10.00.00 firmware.
iDRAC9 firmware updates do not modify user-defined attribute settings. Once iDRAC9 5.10.30.00 is applied to impacted servers, PCIeVDM must be reenabled manually to turn this protocol back on.
racadm>>racadm set iDRAC.PCIeVDM.Enable Enabled
[Key=iDRAC.Embedded.1#PCIeVDM.1]
Object value modified successfully
racadm>>racadm racreset
RAC reset operation initiated successfully. It may take a few
minutes for the RAC to come online again.
Affected Products
iDRAC9, iDRAC9 - 5.xx Series