Unsolved
This post is more than 5 years old
1 Rookie
•
2 Posts
0
30924
January 29th, 2014 00:00
"Transition to non-recoverable" machine check error
Hi. now iam using PowerEdge R815 with 4 AMD opteron 6272 processors.
I got a system event log with system halt as follows
Sensor ID : CPU Machine Chk (0xd)
Description : Transition to Non-recoverable
Entity ID : 34.1
Generator ID : 00b1
I want to know the error code for the machine check error which is described in AMD specification document (BKDG 15h)
As far as I know, if there is machine check error, I can get error code MCA status register (e.g. MC0 ~ MC6) after next warm reset.
But, after warm reset, there was not any machine check error in the MCA status register(I make some linux device driver to read MCA status register)
the first question is that
am i correct about the procedure to get error message from AMD MCA ?
the second question is that
Then, how can i get the error code which cause the machine check error ?
the third question is that
What does "transition to non-recoverable" means ???
Thanks.


DELL-Chris H
7 Practitioner
•
9.7K Posts
•
48K Points
0
January 29th, 2014 06:00
Pat,
There are a a few things that can cause the error. It doesn't mean there is a issue specifically to the processor. A CPU Machine Chk is an error when the processor detects an error during execution of instructions or when told to by the system. I would like to get some additional information from you to help diagnose what's causing the issue.
What is the OS on the system?
Are you able to boot the server, and if so when do you get the error or halt?
Lastly, what revision update are you at for the BIOS and ESM/Drac? I ask as I have seen out of date systems cause similar issues.
Also, this link is helpful in regards to breaking down the error and has some good information - ftp://ftp.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_poweredge/poweredge-7250_Reference%20Guide_en-us.pdf
Let me know.
dummypat
1 Rookie
•
2 Posts
0
January 29th, 2014 07:00
Thanks for your fast reply.
2301 Rev 3.14 - January 23, 2013 BKDG for AMD Family 15h Models 00h-0Fh Processors
Error Type
Description
CTL1
ETG2
EAC
Master Abort
Master abort seen as result of link operation. Reasons for this error include requests to non-existent addresses, and requesting extended addresses while extended mode disabled (see D18F0x[E4,C4,A4,84][Addr64BitEn]). The NB returns an error response back to the requestor with any associated data all 1s independent of the state of the control bit.
MstrAbortEn
L
D
Target Abort
Target abort seen as result of link operation. The NB returns an error response back to the requestor with any associated data all 1s independent of the state of the control bit.
TgtAbortEn
L
D
GART Error
GART cache table walk encountered a GART PTE entry which was invalid.
GartTblWkEn
L
D
Thanks.