Unsolved
1 Message
0
1725
VMware ESXi 7.0 PSOD/MCA report
Dear Dell Community,
I am running a VMware ESXi, in a medium size infrastructure. This morning the ESXi HOST has had a PSOD (Purple Screen of Death). Since the server has no longer warranty aid, I would like to ask for you help to debug the issue.
I am running the following VMware build:
VMware ESXi 7.0.2 build-18538813
VMware ESXi 7.0 Update 2
The following logs I have ran into:
2022-07-02T08:02:27.533Z cpu2:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1. 2022-07-02T08:02:27.579Z cpu4:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page. 2022-07-02T08:02:27.594Z cpu0:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0 2022-07-02T08:06:18.843Z cpu28:2098033)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc33c from world 2100216 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense da ta: 0x5 0x20 0x0 ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: NvmeScsi: 156: SCSI opcode 0x85 (0x45d9161fe100) on path vmhba3:C0:T0:L0 to namespace t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5 382500 failed with NVMe error status: 0x1ESC[0m ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: translating to SCSI error H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0ESC[0m 2022-07-02T08:06:18.845Z cpu48:2097267)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc341 from world 2100216 to dev "t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5382500" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0 2022-07-02T08:07:27.534Z cpu27:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1. 2022-07-02T08:07:27.579Z cpu27:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page. 2022-07-02T08:07:27.594Z cpu27:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0 2022-07-02T08:11:42.170Z cpu21:2098032)ScsiDeviceIO: 4298: Cmd(0x45b918a501c0) 0x1a, CmdSN 0x6a17f4 from world 0 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0 2022-07-02T08:12:27.534Z cpu1:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1. 2022-07-02T08:12:27.580Z cpu1:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page. ...skipping... >>>> 2022-08-07T08:58:34.929Z cpu38:3285114)@BlueScreen: Machine Check Exception on PCPU38 in world 3285114:vmm2:ceph-rg System has encountered a Hardware Error - Please contact the hardware vendor >>>> 2022-08-07T08:58:34.940Z cpu38:3285114)Code start: 0x420035600000 VMK uptime: 184:18:56:02.157 2022-08-07T08:58:34.956Z cpu38:3285114)0x4538e5f1beb0:[0x42003574e446]IDTVMMMCE@vmkernel#nover+0x12 stack: 0xffffffffffffffff 2022-08-07T08:58:34.971Z cpu38:3285114)0x4538e5f1bf90:[0x420035750ba6]IDT_VMMForwardMCE@vmkernel#nover+0xb stack: 0x0 2022-08-07T08:58:34.986Z cpu38:3285114)0x4538e5f1bfa0:[0x420035728859]VMMVMKCall_Call@vmkernel#nover+0xee stack: 0x0 2022-08-07T08:58:35.004Z cpu38:3285114)0x4538e5f1bfe0:[0x420035754549]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x42003575453c 2022-08-07T08:58:35.014Z cpu38:3285114)base fs=0x0 gs=0x420049800000 Kgs=0x0 >>>> 2022-08-07T08:58:34.724Z cpu38:3285114)MCA: 196: UC Excp G5 B1 Sbb80000000000174 A0 M86 P0/0 Cache Hierarchy: Level 0 Data Cache Eviction Error. >>>> 2022-08-07T08:58:35.041Z cpu38:3285114)CPU model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, FMS: 06/4f/1, uCodeRev: b00003e 2022-08-07T08:58:35.041Z cpu38:3285114)PRODUCTNAME:PowerEdge R730, VENDORNAME:Dell Inc., SERIAL_NUMBER:52SJ1M2, SERVER_UUID:4c4c4544-0032-5310-804a-b5c04f314d32, VERSION:, SKU:SKU=NotProvided;ModelName= PowerEdge R730, FAMILY:
What do you guys think the issue was behind the PSOD based on the MCA log report found in the vmkernel dump?
Looking forward to your replies.