Highlighted
8 Krypton

R720 HW error cause reboot or core crash

Hi!

In have the next error in a R720 server.

The idrac show me A PCI parity error was detected on a component at bus 0 device 5 function 0.

And show me problem with PSU1, detect and not detect, etc...

the diagnose tool say me:

Error Code:2000-0251

Validation: 74526

0 Kudos
3 Replies
Moderator
Moderator

RE: R720 HW error cause reboot or core crash

Hello

The idrac show me A PCI parity error was detected on a component at bus 0 device 5 function 0.

What device is in that slot? This is most likely a PCIe card.

Error Code:2000-0251

That just says that there are errors in the hardware log. You must clear the log before running diagnostics if it contains errors. I recommend saving the logs before clearing.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

0 Kudos
8 Krypton

RE: R720 HW error cause reboot or core crash

Is a Xenserver an run lspci -vv

I think this is the device

00:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)
        Subsystem: Dell Device 048c
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag- RBE-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

All firmware are updated, bios, psu, nic daugher card, nic pci card, raid, idrac, etc...

I clean the hardware log, many error of this type..

2017-09-28T19:06:41-0300 RDU0011 The power supplies are redundant.
2017-09-28T19:06:36-0300 RDU0012 Power supply redundancy is lost.

the idrac in the register of lifecycle have many error from past days

Today put the de bios in high perfomance, before have high perfomance radc i thing...

maybe is some parameter in the latest bios!?

Thx

0 Kudos
Moderator
Moderator

RE: R720 HW error cause reboot or core crash

00:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)

That doesn't seem to be a PCIe slot. It looks like the systems management bus.

All of the power messages could be caused by improper shutdowns. If the system is crashing due to PCIe fatal errors then the PSU errors could be a symptom of the problem.

I would suggest that you start with whatever changed recently. If anything was changed when this issue started then that is the likely cause. This could be a hardware or software issue.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

0 Kudos