Start a Conversation

Unsolved

This post is more than 5 years old

1425689

March 6th, 2007 07:00

CPU Machine Chk: processor sensor, transition to non-recoverable

New PowerEdge 2950 server, freshly installed with all the latest firmware and drivers.
Rebooted today with following error in ESM log:
 
"CPU Machine Chk: processor sensor, transition to non-recoverable"
 
Why did this happen, how can I resolve the issue?
I have DSET report available if somene cares to take a look.
 
Thank you,
Drazen

7 Posts

February 11th, 2012 07:00

Im facing the exactly same problem on a PE1950, all BIOS/Firmware are up-to-date, DSET doenst show error on update and show just those errors:

Thu Feb 9 14:24:13 2012  A PCIe error detected on a component at bus 4 device 0 function 0.  0x0200020DD7334FB10004C2266FA00004h

 Thu Feb 9 14:24:14 2012  A PCIe error detected on a component at bus 0 device 28 function 0.  0x0900020ED7334FB10004C2266FA0E000h

 Thu Feb 9 14:24:14 2012  A PCIe error detected on a component at bus 4 device 0 function 0.  0x0800020ED7334FB10004C2266FA00004h

 Thu Feb 9 14:24:14 2012  An OEM diagnostic event has occurred.  0x0700020ED7334FB10004C11A7E014540h

 Thu Feb 9 14:24:14 2012  A PCIe error detected on a component at bus 0 device 28 function 0.  0x0600020ED7334FB10004C2266FA0E000h

 Thu Feb 9 14:24:14 2012  A PCIe error detected on a component at bus 4 device 0 function 0.  0x0500020ED7334FB10004C2266FA00004h

 Thu Feb 9 14:24:14 2012  An OEM diagnostic event has occurred.  0x0400020ED7334FB10004C11A7E014540h

 Thu Feb 9 14:24:14 2012  A PCIe error detected on a component at bus 0 device 28 function 0.  0x0300020ED7334FB10004C2266FA0E000h

 Thu Feb 9 14:24:15 2012  An OEM diagnostic event has occurred.  0x0D00020FD7334FB10004C11A7E014540h

 Thu Feb 9 14:24:15 2012  A PCIe error detected on a component at bus 0 device 28 function 0.  0x0C00020FD7334FB10004C2266FA0E000h

 Thu Feb 9 14:24:15 2012  A PCIe error detected on a component at bus 4 device 0 function 0.  0x0B00020FD7334FB10004C2266FA00004h

 Thu Feb 9 14:24:15 2012  An OEM diagnostic event has occurred.  0x0A00020FD7334FB10004C11A7E014540h

 Thu Feb 9 14:24:25 2012  CPU 1 has an internal error (IERR).  0x0E000219D7334F20000407606F00FFFFh

 Thu Feb 9 14:24:26 2012  CPU 2 has an internal error (IERR).  0x0F00021AD7334F20000407616F00FFFFh

 Thu Feb 9 14:24:40 2012  CPU 1 is operating correctly.  0x10000228D7334F2000040760EF00FFFFh

 Thu Feb 9 14:24:41 2012  CPU 2 is operating correctly.  0x11000229D7334F2000040761EF00FFFFh

 Thu Feb 9 14:25:54 2012  An OEM diagnostic event has occurred.  0x14000272D7334FB10004C1287E000004h

 Thu Feb 9 14:25:54 2012  An OEM diagnostic event has occurred.  0x13000272D7334FB10004C1287E010400h

 Thu Feb 9 14:25:54 2012  CPU 1 machine check detected.  0x12000272D7334FB10004070D07A60100h

 Thu Feb 9 14:25:55 2012  A bus fatal error was detected on a component at bus 4 device 0 function 0.  0x17000273D7334FB1000413186FAA0004h

Any hint?

7 Posts

February 11th, 2012 13:00

Just RAID and PERC, no additional card, all Firmware up-to-date

>update_firmware

Running system inventory...

Searching storage directory for available BIOS updates...
Checking BIOS - 2.7.0
        Available: dell_dup_componentid_00159 - 2.7.0
        Did not find a newer package to install that meets all installation checks.
Checking SAS/SATA Backplane 0:0 Backplane Firmware - 1.05
        Available: dell_dup_componentid_11204 - 1.05
        Did not find a newer package to install that meets all installation checks.
Checking System BIOS for PowerEdge 1950 - 2.7.0
        Did not find a newer package to install that meets all installation checks.
Checking PERC 5/i Integrated Controller 0 Firmware - 5.2.2-0072
        Available: pci_firmware(ven_0x1028_dev_0x0015_subven_0x1028_subdev_0x1f03) - 5.2.2-0072
        Did not find a newer package to install that meets all installation checks.
Checking ST973402SS Firmware - s206
        Did not find a newer package to install that meets all installation checks.
Checking ST9146802SS Firmware - s206
        Did not find a newer package to install that meets all installation checks.
Checking BMC - 2.37
        Available: dell_dup_componentid_05814 - 2.37
        Did not find a newer package to install that meets all installation checks.
Checking DRAC 5 Firmware - 1.60
        Available: dell_dup_componentid_08735 - 1.60
        Did not find a newer package to install that meets all installation checks.

This system does not appear to have any updates available.
No action necessary.


9 Legend

 • 

16.3K Posts

February 11th, 2012 13:00

Other than a RAID controller, what expansion cards do you have?  Which RAID controller do you have?  ALL firmware (BIOS, ESM/BMC, NIC, DRAC, and RAID) are up to date?

7 Posts

February 11th, 2012 14:00

What do you mean by resetting raid controller? How? and How reset each PCIe riser?

The dset i had run was Centos6 OMSA 6.5 iso recomended by dell support.

Ive also tested ram with memtest that didnt found any error on that.

Thanks for your support

9 Legend

 • 

16.3K Posts

February 11th, 2012 14:00

I would suggest reseating the RAID controller and its "sideplane" and each PCIe riser.  I would also recommend running 32-bit diagnostics - primarily for testing the PERC, memory, NIC's, and motherboard, any of which can contribute to this error.

9 Legend

 • 

16.3K Posts

February 11th, 2012 14:00

Not "reset", "reseat" ... removing a device and reinserting it is called "reseating" the device.

In the Hardware Owner's Manual:

RAID controller - page 56
Sideplane - page 85
Expansion risers - page 82

<ADMIN NOTE: Broken link has been removed from this post by Dell>

 


 

 

9 Legend

 • 

16.3K Posts

February 11th, 2012 15:00

No problem :)  Good luck!

7 Posts

February 11th, 2012 15:00

Oh, sorry missread :) Ok ill try to reseat those device monday, now the service is at office.

Ill let you know.

Thanks

7 Posts

February 13th, 2012 02:00

Ive made those test:

Reseated all listed device.

Moved HDs from the 1950 with errors to another 1950 an no error appear.

Format disk on no-error 1950 and move to error 1950 and error appear after move.

Ive also notice that error start just when openmanage get installed (tryed just os 5 reboot no error cames up).

The two 1950 im using for tests are exactly the same:

2x Dual Core CPU

(same disks moveing between boxes)

4GB RAM 4x1Gb

PERC 5/i 256MB with battery

Same IDRAC model.

Any other hint? Is like 2 weeks now im fighting with this, i think i should send back to who sold me and let him change this box.

Thanks

7 Posts

February 13th, 2012 04:00

Another huge difference between two servers:

Working 1950:

[root@server ~]# update_firmware --yes

Running system inventory...

Searching storage directory for available BIOS updates...

Checking SAS/SATA Backplane 0:0 Backplane Firmware - 1.05

       Available: dell_dup_componentid_11204 - 1.05

       Did not find a newer package to install that meets all installation checks.

Checking System BIOS for PowerEdge 1950 - 2.7.0

       Did not find a newer package to install that meets all installation checks.

Checking NetXtreme II BCM5708 Gigabit Ethernet rev 12 (eth0) - 6.2.16

       Available: pci_firmware(ven_0x14e4_dev_0x164c) - 6.2.16

       Did not find a newer package to install that meets all installation checks.

Checking PERC 5/i Integrated Controller 0 Firmware - 5.2.2-0072

       Available: pci_firmware(ven_0x1028_dev_0x0015_subven_0x1028_subdev_0x1f03) - 5.2.2-0072

       Did not find a newer package to install that meets all installation checks.

Checking ST973402SS Firmware - s206

       Did not find a newer package to install that meets all installation checks.

Checking NetXtreme II BCM5708 Gigabit Ethernet rev 12 (eth1) - 6.2.16

       Available: pci_firmware(ven_0x14e4_dev_0x164c) - 6.2.16

       Did not find a newer package to install that meets all installation checks.

Checking ST9146802SS Firmware - s206

       Did not find a newer package to install that meets all installation checks.

Checking BMC - 2.37

       Available: dell_dup_componentid_05814 - 2.37

       Did not find a newer package to install that meets all installation checks.

Checking BIOS - 2.7.0

       Available: dell_dup_componentid_00159 - 2.7.0

       Did not find a newer package to install that meets all installation checks.

Checking DRAC 5 Firmware - 1.60

       Available: dell_dup_componentid_08735 - 1.60

       Did not find a newer package to install that meets all installation checks.

This system does not appear to have any updates available.

No action necessary.

NOT working 1950:

[root@server ~]# update_firmware

Running system inventory...

Searching storage directory for available BIOS updates...

Checking BIOS - 2.7.0

       Available: dell_dup_componentid_00159 - 2.7.0

       Did not find a newer package to install that meets all installation checks.

Checking SAS/SATA Backplane 0:0 Backplane Firmware - 1.05

       Available: dell_dup_componentid_11204 - 1.05

       Did not find a newer package to install that meets all installation checks.

Checking System BIOS for PowerEdge 1950 - 2.7.0

       Did not find a newer package to install that meets all installation checks.

Checking PERC 5/i Integrated Controller 0 Firmware - 5.2.2-0072

       Available: pci_firmware(ven_0x1028_dev_0x0015_subven_0x1028_subdev_0x1f03) - 5.2.2-0072

       Did not find a newer package to install that meets all installation checks.

Checking ST973402SS Firmware - s206

       Did not find a newer package to install that meets all installation checks.

Checking ST9146802SS Firmware - s206

       Did not find a newer package to install that meets all installation checks.

Checking BMC - 2.37

       Available: dell_dup_componentid_05814 - 2.37

       Did not find a newer package to install that meets all installation checks.

This system does not appear to have any updates available.

No action necessary.

The not-working one seems doenst recognize the eth cards..

The courios part is this:

Running Transaction

 Installing     : BMC_Firmware_componentid_05814_for_PowerEdge_1950                                                                                                            1/6

 Installing     : PERC_5_i_Integrated_ven_0x1028_dev_0x0015_subven_0x1028_subdev_0x1f03                                                                                        2/6

 Installing     : Server_BIOS_componentid_00159_for_PowerEdge_1950                                                                                                             3/6

 Installing     : SAS_Backplane_Firmware_componentid_11204_for_PowerEdge_1950                                                                                                  4/6

 Installing     : dell_ie_nic_broadcom                                                                                                                                         5/6

 Installing     : BCM5708_Copper_LOM_ven_0x14e4_dev_0x164c                    

From the repository actually yum installed BCM5708_Copper_LOM_ven_0x14e4_dev_0x164c that should be the update for eth cards.

Another issue ive noticed is that if i manually downlaod that package (BIN) and manually install it, it came up with and error that say that cannot find any ethernet card...

Regards

9 Legend

 • 

16.3K Posts

February 13th, 2012 08:00

If you have acquired these recently and have the ability to return it, I would do that, as something is definitely not right.

7 Posts

February 13th, 2012 10:00

Ive managed to fix this issue.

The fact the network adapter wasnt listed made me think...

So, ive installed Win 2k8 Server and updated the firmware of ethernet device from there (from v2.x.x to v6.x.x), installed OMSA on windows and error didnt came up.

So ive reinstalled Linux (CentOS 5.7 64Bit) with OMSA and problem didnt come up again (6 reboot till now).

Hope will be useful to someone else in the future.

Regards, and thanks for your help.

39 Posts

January 22nd, 2013 13:00

NO, this did not *FIX* the problem - this simply "cleared the error temporarily."

Clearing out the log does NOT fix the problem.

39 Posts

January 22nd, 2013 13:00

My above comment was to the person who said he "FIXED" the issue by running a "clear log" shell script.

29 Posts

August 22nd, 2014 13:00

Thanks so much.  I purchased a "refurbished" 1950II off ebay and after installing Ubuntu 14.04 Server I would randomly get CPU IERR errors and device errors.

I updated the BIOS, BCM, PERC, and Broadcom firmwares and all is working perfectly.  Glad I ran across this as I was about to try to send back to seller.

No Events found!

Top