PowerPath: Common ESXi issues and items to check for troubleshooting

Summary: The purpose of this KB article is to provide common information about ESXi issues and the steps to troubleshoot them.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Cause
There are many things that can cause issues with an ESXi host.
This presentation is a list of some of the most common things that may be found and their troubleshooting steps.

Resolution

Basic checks
  • Version - Is the version current and still supported

  • Check the "Known Issues" section of the release notes for common issues, fixes, and JIRA links.

  • Versions for PowerPath can be found in the following locations:

  • PP/rpowermt version

  • File location: host/commands/localcli_software-vib-list.txt

  • Common Issues and Errors

Common Issues and Errors

  • Connectivity
  • Permanent Device Loss
  • All Path Down
  • PowerPath


Connectivity 

Messages are seen in the vmkernel and often vmkwarning outputs.

"state in doubt; requested fast path state update"

These messages appear as the Host Bus Adapter (HBA) driver cancels a command because the command took longer than the timeout period of 5 s to complete. An operation can take longer than the timeout period because of several reasons including:

  • Array backup operations (LUN backup, replication, so on)
  • General overload on the array
  • Read/Write Cache on the array (misconfiguration, lack of cache, so on)
  • Fabric issues (Bad Inter-Switch Link (ISL), outdated firmware, bad fabric cable/GBIC)
  • High SAN latency 

VMware KB# 1022026 This hyperlink is taking you to a website outside of Dell Technologies.

Example:

In the /var/log/vmkernel.log file of the ESXi host, you see entries similar to:

 

<YYYY-MM-DD>T<time> esx12 vmkernel: 116:03:44:19.039 cpu4:4196)<6>qla2xxx 0000:0f:00.0: scsi(6:0:152): Abort command issued -- 1 67a23dcd 2002.

<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100020e0b00) to NMP device "sym.029010111831353837" failed on physical path "vmhba2:C0:T0:L152" H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

<YYYY-MM-DD>T<time></time> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "sym.029010111831353837" state in doubt; requested fast path state update...</time>

 

/commands/Localcli_storage-core-adapter-stats-get.txt

The above can be useful for checking HBA load balance and for reservation conflicts.
A large imbalance of successful commands can indicate a fixed path policy or other balancing issues.

Reservation conflicts can be indicative of Host Logical Unit (HLU) mismatches on Unity arrays.  

Dell EMC Unity/VNX/CLARiiON: VMware cannot see LUNs correctly if they are in multiple Storage Groups and the HLU does not match (User Correctable)
 

Localcli_storage-core-device-stats-get.txt 

The above gives LUN stats and shows which LUNs have reservation conflicts.  

 

/commands/localcli_storage-san-fc-stats-get.txt

The above command is useful for checking HBA stats such as: 

  • Dumped Frames
  • Link Failures Count
  • Loss of Signal Counts
  • Invalid Tx Word Count

 

/commands/Localcli_storage-san-fc-events-get.txt

Shows recent FC event timestamps, link up or down, and so on.  

 

/var/run/log/vmksummary.log

Shows timestamps of when the host was booted and rebooted or was unresponsive.
To my understanding HBA stats are reset on reboot.
This gives a timeframe of when the FC stats occurred.  

Sample:

2022-10-09T13:05:21Z bootstop: Host is rebooting

2022-10-09T13:10:55.351Z bootstop[2107273]: Host has booted

 

When performing storage array maintenance or any action that would cause an array target to go offline/online, the Cisco Native FNIC driver may not properly log back into the target resulting in paths remaining in a dead state.

This issue is caused by the Cisco Native FNIC driver receiving an RSCN during the REPORT_LUNS command part of the nfnic port login process, which results in the driver halting and not retrying the login process. This was observed with both the IBM SVC and IBM V7000 array, but it would also have been observed on any IBM Storwize array since they all use the same software stack. This would also be observed for non-IBM arrays, as long as they issue an RSCN during the REPORT_LUNS command that the driver sends during login.

Both issues with performance and path down/APD are resolved by upgrading to nfnic 4.0.0.63 and above.
Contact VMware and Cisco for additional information and support.

Driver versions can be found in /commands/localcli_software-vib-list.txt

(enter driver vib names here) (possible difs with 6.x vs 7.x)

VMware KB# 80101This hyperlink is taking you to a website outside of Dell Technologies.

 

Permanent Device Loss (PDL)/All Path Down (APD)

Permanent Device Loss (PDL)

  • A datastore is shown as unavailable in the Storage view.
  • A storage adapter indicates the Operational State of the device as Lost Communication.
  • All paths to the device are marked as Dead.
  • In the /var/log/vmkernel.log file, you see entries similar to:

 

Example

cpu2:853571)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:661: Path "vmhba4:C0:T0:L0" (PERM LOSS) command 0xa3 failed with status Device is permanently unavailable. H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

cpu2:853571)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:972:Could not select path for device "naa.60a98000572d54724a34642d71325763".

cpu2:853571)WARNING: ScsiDevice: 1223: Device :naa.60a98000572d54724a34642d71325763 has been removed or is permanently inaccessible.

cpu3:2132)ScsiDeviceIO: 2288: Cmd(0x4124403c1fc0) 0x9e, CmdSN 0xec86 to dev "naa.60a98000572d54724a34642d71325763" failed H:0x8 D:0x0 P:0x0

cpu3:2132)WARNING: NMP: nmp_DeviceStartLoop:721:NMP Device "naa.60a98000572d54724a34642d71325763" is blocked. Not starting I/O from device.

cpu2:2127)ScsiDeviceIO: 2316: Cmd(0x4124403c1fc0) 0x25, CmdSN 0xecab to dev "naa.60a98000572d54724a34642d71325763" failed H:0x1 D:0x0 P:0x0 Possible sense data: 0x5 0x25 0x0.

cpu2:854568)WARNING: ScsiDeviceIO: 7330: READ CAPACITY on device "naa.60a98000572d54724a34642d71325763" from Plugin "NMP" failed. I/O error

cpu2:854568)ScsiDevice: 1238: Permanently inaccessible device :naa.60a98000572d54724a34642d71325763 has no more open connections. It is now safe to unmount datastores (if any) and delete the device.

 

All Path Down (APD)

  • A datastore is shown as unavailable in the Storage view.
  • A storage adapter indicates the Operational State of the device as Dead or Error.
  • All paths to the device are marked as Dead.
  • You are unable to connect directly to the ESXi host using the vSphere Client.
  • The ESXi host shows as Disconnected in vCenter Server.
  • In the /var/log/vmkernel.log file, similar entries are seen to:

 

Example

cpu1:2049)WARNING: NMP: nmp_IssueCommandToDevice:2954:I/O could not be issued to device "naa.60a98000572d54724a34642d71325763" due to Not found

cpu1:2049)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.60a98000572d54724a34642d71325763": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

cpu1:2049)WARNING: NMP: nmp_DeviceStartLoop:721:NMP Device "naa.60a98000572d54724a34642d71325763" is blocked. Not starting I/O from device.

cpu1:2642)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa.60a98000572d54724a34642d71325763" - issuing command 0x4124007ba7c0

cpu1:2642)WARNING: NMP: nmpDeviceAttemptFailover:658:Retry world failover device "naa.60a98000572d54724a34642d71325763" - failed to issue command due to Not found (APD), try again...

 

*Check the VMware KB# for the resolution and additional examples based on various circumstances*.

**SAN should be checked as well as an action item for ADP/PDL issues**. 

VMware KB# 2004684This hyperlink is taking you to a website outside of Dell Technologies.

 

PowerPath

If PowerPath is present, there are a few additional things to check.

Compatibility - Is the PowerPath version in use supported with the running version of ESXi.
This can be verified in ESM.

Connectivity - 

There are several types of messages that can appear when PowerPath detects a path that is lost including. 

PowerPath: How to investigate path dead in PowerPath


NMP Settings

For most Dell arrays*, except VPLEX, Round Robin (policy=rr ) with IOPS=1 is recommended for best performance.
This setting should be checked when performance or latency is mentioned.

This can be found in the grabs under /commands/localcli_storage-nmp-device-list.txt or /json/localcli_storage-nmp-device-list.json 

*Always see the most current host connectivity guide and storage best practice guides for up-to-date recommendations.

VMware article number 2069356

Adjusting Round Robin IOPS limit from default 1000 to 1 (2069356)This hyperlink is taking you to a website outside of Dell Technologies.

Dell EMC Host Connectivity Guide VMware ESXi Server

Unity - page 36

PowerStore - page 62

EMC XtremIO Host Connectivity Guides

Chapter 3 - page 57

 

Example of NMNP settings in /commands/localcli_storage-nmp-device-list.txt

Incorrect Settings

naa.6006016051904d00f056b95dc4abd917:

   Device Display Name: DGC Fibre Channel Disk (naa.6006016051904d00f056b95dc4abd917)

   Storage Array Type: VMW_SATP_ALUA_CX

   Storage Array Type Device Config: {navireg=on, ipfilter=on} {implicit_support=on; explicit_support=on; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=2,TPG_state=AO}{TPG_id=1,TPG_state=ANO}}

   Path Selection Policy: VMW_PSP_RR

   Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;  lastPathIndex=3: NumIOsPending=0,numBytesPending=0}

   Path Selection Policy Device Custom Config: 

   Working Paths: vmhba1:C0:T1:L10, vmhba0:C0:T0:L10

 

Correct settings

naa.6006016051904d00f056b95dc4abd917:

   Device Display Name: DGC Fibre Channel Disk (naa.6006016051904d00f056b95dc4abd917)

   Storage Array Type: VMW_SATP_ALUA_CX

   Storage Array Type Device Config: {navireg=on, ipfilter=on} {implicit_support=on; explicit_support=on; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=2,TPG_state=AO}{TPG_id=1,TPG_state=ANO}}

   Path Selection Policy: VMW_PSP_RR

   Path Selection Policy Device Config: {policy=rr,iops=1,bytes=10485760,useANO=0; lastPathIndex=3: NumIOsPending=0,numBytesPending=0}

   Path Selection Policy Device Custom Config: 

   Working Paths: vmhba1:C0:T1:L10, vmhba0:C0:T0:L10

 

 

Caveats

ESXi 6.7 has several known issues with Cisco nfnic drivers that cause performance and connectivity issues.
If the issue is related to one of the above, verify the Cisco nfnic driver version and check the VMware Knowledgebase (KB) for impacted versions.

Driver version is found in the output of /commands/localcli_software-vib-list.txt file.

Additional Information
In the event other teams must be engaged, be sure to get the following:

  • Logs (switch/storage)
  • Storage SN#
  • Date & time of the issue

If a customer requests assistance engaging VMware, direct them to the VMware "contact us" page.
Support Contact Options This hyperlink is taking you to a website outside of Dell Technologies.

Additional Information

See all documentation for known issues, such as Release Notes and the CLI Common messages guide for up-to-date information about known issues and resolutions.

Affected Products

PowerPath, PowerPath, PowerPath/VE, PowerPath/VE for VMware
Article Properties
Article Number: 000205090
Article Type: How To
Last Modified: 12 Nov 2025
Version:  7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.