PowerPath/VE: VMware ESXi panic message: "#PF Exception 14 in world vmm0"
Summary: VMware ESXi #PF Exception 14
Symptoms
OS: VMware ESXi 6.5.0 build
Dell Software: PowerPath/VE 6.3 (build 105)
Dell Software: PowerPath/VE 6.4 (build 103)
Dell Software: PowerPath/VE 6.5 (build 110)
Dell Hardware: Symmetrix
Unexpected VMware ESXi server panic with no apparent trigger event
PowerPath/VE and Symmetrix running 5978 code or later is required in order to be exposed to this issue.
2019-06-11T05:56:03.906Z cpu23:47993633)@BlueScreen: #PF Exception 14 in world 47993633:vmm0:FRAJXSA IP 0x418024500c9a addr 0x410006dcffc4 PTEs:0x8000853023;0x800082e023;0x80008a0023;0x0; 2019-06-11T05:56:03.907Z cpu23:47993633)Code start: 0x418024200000 VMK uptime: 69:00:05:14.200 2019-06-11T05:56:03.907Z cpu23:47993633)0x43941909bb50:[0x418024500c9a]Sched_SysServiceDone@vmkernel#nover+0x8a stack: 0x439dcb2afe80 2019-06-11T05:56:03.907Z cpu23:47993633)0x43941909bbb0:[0x4180245360ce]SCSICompleteAdapterCommand@vmkernel#nover+0x152 stack: 0x410006dd0040 2019-06-11T05:56:03.908Z cpu23:47993633)0x43941909bc30:[0x418024b69a09]SCSILinuxWorldletFn@com.vmware.driverAPI#9.2+0x3f1 stack: 0x4180242d1a38 2019-06-11T05:56:03.908Z cpu23:47993633)0x43941909bd90:[0x418024326ea8]WorldletBHHandler@vmkernel#nover+0x478 stack: 0x0 2019-06-11T05:56:03.909Z cpu23:47993633)0x43941909bef0:[0x4180242b1cb0]BH_DrainAndDisableInterrupts@vmkernel#nover+0x100 stack: 0x0 2019-06-11T05:56:03.909Z cpu23:47993633)0x43941909bf80:[0x418024319e66]VMMVMKCall_Call@vmkernel#nover+0x196 stack: 0x43941909bfec 2019-06-11T05:56:03.910Z cpu23:47993633)0x43941909bfe0:[0x41802434b8a2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802434b894 2019-06-11T05:56:03.913Z cpu23:47993633)base fs=0x0 gs=0x418045c00000 Kgs=0x0 2019-06-24T08:43:40.022Z cpu17:169970)@BlueScreen: #PF Exception 14 in world 169970:vmm0:FRAWINE IP 0x41802f30155a addr 0x410006d6ffc4 PTEs:0x8000053023;0x800002b023;0x800009e023;0x0; 2019-06-24T08:43:40.023Z cpu17:169970)Code start: 0x41802f000000 VMK uptime: 6:00:01:30.899 2019-06-24T08:43:40.023Z cpu17:169970)0x43923f91bd30:[0x41802f30155a]Sched_SysServiceDone@vmkernel#nover+0x8a stack: 0xfc40a085 2019-06-24T08:43:40.023Z cpu17:169970)0x43923f91bd90:[0x41802f126e31]WorldletBHHandler@vmkernel#nover+0xe1 stack: 0x418042800c00 2019-06-24T08:43:40.024Z cpu17:169970)0x43923f91bef0:[0x41802f0b1db0]BH_DrainAndDisableInterrupts@vmkernel#nover+0x100 stack: 0x0 2019-06-24T08:43:40.024Z cpu17:169970)0x43923f91bf80:[0x41802f11a186]VMMVMKCall_Call@vmkernel#nover+0x196 stack: 0x43923f91bfec 2019-06-24T08:43:40.025Z cpu17:169970)0x43923f91bfe0:[0x41802f14b8a2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802f14b894 2019-06-24T08:43:40.028Z cpu17:169970)base fs=0x0 gs=0x418044400000 Kgs=0x0
Cause
VMware Engineering has determined that this issue is caused by a preemption anomaly resulting in random context panicking in SchedSysServiceContextPut().
PowerPath/VE for VMware 6.3, 6.4, and 6.5 has an issue in its app finger printing feature which may cause a preemption anomaly.
Resolution
While troubleshooting this issue internally, a PowerPath/VE issue was discovered which relates to the app finger printing feature. While we cannot be 100% certain it is the cause of the panic seen by the user, as a precaution we are recommending any user that has experienced this type of panic to disable the app finger printing feature.
Workaround (Only applicable if below Symmetrix microcode 5978.221):
Disable app finger printing.
Resolution:
Upgrade to PowerPath/VE 7.0 P01 or a later release which are currently available for download on the Dell Support website.
Additional Information
Below are the rpowermt commands to display and disable the "app finger printing" feature.
To verify if the feature is enabled:
# rpowermt display options host=<ESXi host name/IP> Show CLARiiON LUN names: true Path Latency Monitor: Off Performance Monitor: disabled Autostandby: IOs per Failure (iopf): enabled iopf aging period : 1 d iopf limit : 6000 Storage System Class Attributes ------------ ---------- Symmetrix periodic autorestore = on reactive autorestore = on auto host registration = enabled app finger printing = enabled device to array performance report = enabled device in use to array report = enabled
To turn off the feature:
# rpowermt set app_finger_printing=off host=<ESXi host name/IP>
To verify if the feature is disabled:
# rpowermt display options host=<ESXi host name/IP> Show CLARiiON LUN names: true Path Latency Monitor: Off Performance Monitor: disabled Autostandby: IOs per Failure (iopf): enabled iopf aging period : 1 d iopf limit : 6000 Storage System Class Attributes ------------ ---------- Symmetrix periodic autorestore = on reactive autorestore = on auto host registration = enabled app finger printing = disabled device to array performance report = enabled device in use to array report = enabled
- This feature enable/disable does not require any maintenance activity on ESXi hosts, and it is persistent across reboots.
- There are no changes required on the array side associated with this feature.
- This workaround is only applicable for Symmetrix microcode below 5978.221.
- Starting with Symmetrix microcode 5978.221 and later I/O tagging is enabled on the Symmetrix which triggers a defect in PowerPath/VE which causes a PSOD.