Start a Conversation

Unsolved

This post is more than 5 years old

1246511

May 27th, 2010 23:00

PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 0 Device 1 Function 0) was asserted

I found this errors from /var/log/messages and System Event Logs:

May 21 11:14:23 Server Administrator: Instrumentation Service EventID: 1014  System software event:  Description: Err Reg Pointer: Link Tuning sensor, OEM Diagnostic data event was asserted  Date and time of action: Tue Jul  7 07:54:11 1970
May 21 11:14:23 Server Administrator: Instrumentation Service EventID: 1014  System software event:  Description: PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 0 Device 1 Function 0) was asserted  Date and time of action: Tue Jul  7 07:54:11 1970
May 21 11:14:23 Server Administrator: Instrumentation Service EventID: 1014  System software event:  Description: PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 1 Device 0 Function 0) was asserted  Date and time of action: Tue Jul  7 07:54:11 1970

System: PowerEdge R710
Server OS: Centos 5.5

lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA IDE Controller (rev 02)
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
07:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09)
08:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
09:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)

Is this something I should worry about? I did several reboots and error didn't come back.
Not sure if I should log a job with Dell support.
Let me know if you need more info.

Thanks in adavance.

38 Posts

December 10th, 2010 06:00

I found my R710 PE server (ESX 4.1i installed) hung at starting up today.   Same error as above showing (PCIE Fatal Err) on my ESX server.  Anyone have a clue to the above poster?

 

Severity Date/Time Description
Options : View System Event Log
 Normal 0.000004Thu Dec 09 2010 22:40:30 Err Reg Pointer: OEM sensor, OEM Diagnostic data event was asserted
 Critical 0.000003Thu Dec 09 2010 22:40:30 PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 0 Device 5 Function 0) was asserted
 Critical 0.000002Thu Dec 09 2010 22:40:29 PCIE Fatal Err: Critical Event sensor, bus fatal error (Slot 2) was asserted

2 Posts

March 1st, 2012 01:00

coafark: Was this resolved? If yes, how?

I got the exact same error on my PE R710 , followed by a hardware error. My PCIe SSD Drive is not working any more:

System software event:

Description: Err Reg Pointer: OEM sensor, OEM Diagnostic data event was asserted

Date and time of action: Sun Mar 01 10:28:45 1970

System software event:

Description: PCIE Fatal Err: Critical Event sensor, bus fatal error (Bus 0 Device 9 Function 0) was asserted

Date and time of action: Sun Mar 01 10:28:45 1970

System software event:

Description: PCIE Fatal Err: Critical Event sensor, bus fatal error (Slot 3) was asserted

Date and time of action: Sun Mar 01 10:28:45 1970

-----

A fatal hardware error has occurred.

Component: PCI Express Root Port

Error Source: Generic

Bus:Device:Function: 0x0:0x9:0x0

Vendor ID:Device ID: 0x8086:0x3410

Class Code: 0x60400

July 3rd, 2013 00:00

I am seeing a similar error on R620. My system gets hung on a similar error "PCI1320 Bus fatal error on bus 7 device 0 function 1. Power cycle system". Any one found a solution yet?

3 Posts

December 3rd, 2014 06:00

Normally lspci from Linux will give you the readout to see what is on Bus 7 Device 0 Function 1.  Most systems will have different cards and therefore addresses will be different. It's recommended to run a DSET report at support.dell.com/dset and call into support.  In the data folder there is a RAWXML folder with a slots.xml listed.  You can can get a pretty good idea on what device is causing it by looking for busnumber or devicenumber. Always update your system to the latest and greatest for best practice and remove any third party cards to test.

3 Posts

December 3rd, 2014 06:00

That normally points to the network. One of your NICs and possibly the motherboard need an update.  I'd recommend either the Lifecycle Controller and perform a platform update or using a bootable repository from Dell's latest repository iso dell.app.box.com/Bootabler710. ; It may come back, and it may not. If it does, it may be an onboard NIC failure and may need parts replacement worse case scenario.

3 Posts

December 3rd, 2014 07:00

vivekrs...your issue is whatever card you have in slot 3. If it's dell hardware, update the firmware on the card and the motherboard. If it's third party, update the firmware on the card and on the board. If the error persists, pull the card. If it's dell hardware, call support. They may need you to swap card slots to be sure and see if the error follows the card or only when the card is in slot 3.

2 Posts

December 3rd, 2014 10:00

Looks like somebody is on a spring-cleaning spree... :)

This thread is from eons ago - I don't even remember posting that message! Whatever I did (or did not) fixed (or circumvented) the error I was having at the time.

Thanks for your help though zsstorm!

No Events found!

Top