PowerPath/VE for VMware cannot claim (all) VPLEX LUNs at boot time
Summary: When the host is rebooted, NMP manages some or all VPLEX LUNs (instead of PowerPath/VE).
Symptoms
Environment:
OS: VMware ESXi 6.0.0 Update 2 (build-3620759, build-4192238)
EMC SW: PowerPath/VE for VMware vSphere 6.0
EMC SW: PowerPath/VE for VMware vSphere 6.0 SP1
EMC SW: PowerPath/VE for VMware vSphere 6.1
Server: HP ProLiant BL460c Gen9
Host Bus Adapter: Emulex Corporation Emulex OneConnect OCe14000, FCoE Initiator: 650FLB CNA
HBA Driver: lpfc 11.1.145.18-1OEM.600.0.0.2768847 EMU VMwareCertified 2016-12-04
Product: VPLEX (5410, 5520)
From vmkernel.log
2017-05-16T08:06:50.035Z cpu21:33912)ScsiClaimrule: 1165: The current claimrules indicate that path vmhba0:C0:T0:L1 should be claimed by plugin PowerPath. 2017-05-16T08:06:50.035Z cpu21:33912)ScsiClaimrule: 1169: Path vmhba0:C0:T6:L1 which appears to refer to the same physical media as path vmhba0:C0:T0:L1 is already claimed by plugin NMP. 2017-05-16T08:06:50.035Z cpu21:33912)ScsiClaimrule: 1171: If neither of these paths is being masked by ESX, this condition indicates a problem with the claimrules. 2017-05-16T08:06:50.035Z cpu21:33912)WARNING: ScsiPath: 608: Path vmhba0:C0:T0:L1 claims to be a VVol PE but has a version of 4 (expected 5 or higher). Not treating it as a PE. 2017-05-16T08:06:50.036Z cpu21:33912)ScsiPath: 5549: Plugin 'NMP' claimed path 'vmhba0:C0:T0:L1'
Cause
Inquiry commands to the problem devices failed during the boot process.
Resolution
In this case, the host vendor replaced two HBA cards from FLB 650 to FLB 630 in two servers.
After servers reboot, no problem found. PowerPath/VE is properly managing the devices.
Additional Information
The model of HBA can be found in the localcli_storage-core-adapter-list.txt output.
The server make and model can be found in esxcfg-info_-a.txt.FRAG-00000.txt.
The version of VMware can be found in vmware_-vl.txt.
The model of array and firmware can be found in localcli_storage-core-device-list.txt.
To properly troubleshoot the issue, an engineering special build was used.
PowerPath relies on the SCSI inquiry command to claim a path. From engineering test package logs, we can observe that inquiry commands were initially failing with HOST_RETRY (0x12) or HOST_NO_CONNECT (0x1) errors. But later when ESXi again offered the device, they succeeded resulting in PowerPath claiming the device (without an Engineering build, this second sequence is not seen)
In response to HOST_RETRY errors, PowerPath has even retried the inquiry command multiple times at an internal of 0.1 seconds. But still the host adapter failed the command as can be seen from the logs.
// Inquiry failure at the beginning
2016-12-14T11:53:51.561Z cpu24:33396)PowerPath:Claiming path vmhba0:C0:T1:L0 2016-12-14T11:53:51.561Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:51.663Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:51.765Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:51.867Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:51.969Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:56.772Z cpu46:33491)ALERT: PowerPath:MpxRecognize failed. Path vmhba0:C0:T1:L0 not claimed
// Inquiry succeeded at the end and PowerPath claiming the device - This sequence does not happen with a regular GA build.
2016-12-14T11:54:08.542Z cpu12:34080)PowerPath:Claiming path vmhba0:C0:T1:L0 2016-12-14T11:54:08.545Z cpu12:34080)PowerPath:Path Claim: Successfully claimed path vmhba0:C0:T1:L0
Simultaneously, we could see lpfc driver reporting Link Down and up, delay in port discovery messages and so on.
2016-12-14T11:53:46.586Z cpu30:33490)WARNING: lpfc: lpfc_mbx_cmpl_read_topology:3271: 0:1305 Link Down Event x5 received Data: x5 x20 x800220 x0 2016-12-14T11:53:46.704Z cpu4:33493)WARNING: lpfc: lpfc_mbx_cmpl_read_topology:3271: 1:1305 Link Down Event x5 received Data: x5 x20 x800220 x0 2016-12-14T11:53:49.334Z cpu30:33490)WARNING: lpfc: lpfc_mbx_cmpl_read_topology:3247: 0:1303 Link Up Event x6 received Data: x6 x0 x5 x0 x0 2016-12-14T11:53:52.337Z cpu25:33493)WARNING: lpfc: lpfc_mbx_cmpl_read_topology:3247: 1:1303 Link Up Event x6 received Data: x6 x0 x5 x0 x0 2016-12-14T11:53:52.452Z cpu25:33493)WARNING: lpfc: lpfc_sli4_async_fip_evt:5702: 1:2546 New FCF event, evt_tag:x7, index:x0 2016-12-14T11:53:52.479Z cpu24:33396)PowerPath:PowerPlatformScsiIoErrorIsRetryable: cmd=0x12 Failed H: 0xc S: 0x0 P: 0x0 Path=vmhba0:C0:T1:L0 2016-12-14T11:53:52.505Z cpu25:33493)WARNING: lpfc: lpfc_do_scr_ns_plogi:8098: 1:3334 Delay fc port discovery for 10 seconds
To claim a path, an inquiry has to succeed. But due to host adapter errors during the boot process, inquiry is failing and, as a result, PowerPath is not claiming the device.
This is not a PowerPath issue.
Our recommendation is to engage the VMware/adapter vendor for reason for these transient failures Host Retry (0xc), No CONNECT (0x1) errors during host boot.
If these adapter related transient errors are fixed, PowerPath should not have any problem in claiming the device.