Start a Conversation

Unsolved

1 Message

1725

August 7th, 2022 03:00

VMware ESXi 7.0 PSOD/MCA report

Dear Dell Community,

I am running a VMware ESXi, in a medium size infrastructure. This morning the ESXi HOST has had a PSOD (Purple Screen of Death). Since the server has no longer warranty aid, I would like to ask for you help to debug the issue.

I am running the following VMware build:
VMware ESXi 7.0.2 build-18538813
VMware ESXi 7.0 Update 2

 

The following logs I have ran into:

2022-07-02T08:02:27.533Z cpu2:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:02:27.579Z cpu4:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
2022-07-02T08:02:27.594Z cpu0:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0
2022-07-02T08:06:18.843Z cpu28:2098033)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc33c from world 2100216 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense da
ta: 0x5 0x20 0x0
ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: NvmeScsi: 156: SCSI opcode 0x85 (0x45d9161fe100) on path vmhba3:C0:T0:L0 to namespace t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5
382500 failed with NVMe error status: 0x1ESC[0m
ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: translating to SCSI error H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0ESC[0m
2022-07-02T08:06:18.845Z cpu48:2097267)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc341 from world 2100216 to dev "t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5382500" failed
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
2022-07-02T08:07:27.534Z cpu27:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:07:27.579Z cpu27:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
2022-07-02T08:07:27.594Z cpu27:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0
2022-07-02T08:11:42.170Z cpu21:2098032)ScsiDeviceIO: 4298: Cmd(0x45b918a501c0) 0x1a, CmdSN 0x6a17f4 from world 0 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense data:
0x5 0x24 0x0
2022-07-02T08:12:27.534Z cpu1:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:12:27.580Z cpu1:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
...skipping...

>>>>
2022-08-07T08:58:34.929Z cpu38:3285114)@BlueScreen: Machine Check Exception on PCPU38 in world 3285114:vmm2:ceph-rg
System has encountered a Hardware Error - Please contact the hardware vendor
>>>>

2022-08-07T08:58:34.940Z cpu38:3285114)Code start: 0x420035600000 VMK uptime: 184:18:56:02.157
2022-08-07T08:58:34.956Z cpu38:3285114)0x4538e5f1beb0:[0x42003574e446]IDTVMMMCE@vmkernel#nover+0x12 stack: 0xffffffffffffffff
2022-08-07T08:58:34.971Z cpu38:3285114)0x4538e5f1bf90:[0x420035750ba6]IDT_VMMForwardMCE@vmkernel#nover+0xb stack: 0x0
2022-08-07T08:58:34.986Z cpu38:3285114)0x4538e5f1bfa0:[0x420035728859]VMMVMKCall_Call@vmkernel#nover+0xee stack: 0x0
2022-08-07T08:58:35.004Z cpu38:3285114)0x4538e5f1bfe0:[0x420035754549]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x42003575453c
2022-08-07T08:58:35.014Z cpu38:3285114)base fs=0x0 gs=0x420049800000 Kgs=0x0

>>>>
2022-08-07T08:58:34.724Z cpu38:3285114)MCA: 196: UC Excp G5 B1 Sbb80000000000174 A0 M86 P0/0 Cache Hierarchy: Level 0 Data Cache Eviction Error.
>>>>

2022-08-07T08:58:35.041Z cpu38:3285114)CPU model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, FMS: 06/4f/1, uCodeRev: b00003e
2022-08-07T08:58:35.041Z cpu38:3285114)PRODUCTNAME:PowerEdge R730, VENDORNAME:Dell Inc., SERIAL_NUMBER:52SJ1M2, SERVER_UUID:4c4c4544-0032-5310-804a-b5c04f314d32, VERSION:, SKU:SKU=NotProvided;ModelName=
PowerEdge R730, FAMILY:

 

What do you guys think the issue was behind the PSOD based on the MCA log report found in the vmkernel dump?

Looking forward to your replies. 

No Responses!
No Events found!

Top