Unsolved

This post is more than 5 years old

1 Message

3826

May 12th, 2020 17:00

PowerEdge T420 running hardware diagnostics

We have a PowerEdge T420 that has been running fine for 5+ years. Recently we have started having issues after reboots where it gets stuck on testing memory and we need to turn it off and back on again and it will then boot in to Windows. I also found a critical event in the event logs related to memory so I'm starting to think it's a memory issue. 

A fatal hardware error has occurred.

Component: Memory

Error Source: Machine Check Exception

 

The server is at a remote site so I'm trying to walk someone through running the built-in diagnostics remotely so we can do a memory test, but part way through we get an alert that "The event indicates degraded or disabled ECC functionality. Memory testing cannot continue until the problems are corrected, the log cleared and the system rebooted"

At this point I have done some research and found that I probably need to clear the log files. We could not find anywhere to clear the log files in the diagnostic tool. I installed the Dell EMC Support Assistant tool, downloaded the logs and then tried to clear them, but when I try to select "clear system event log" in EMC it is greyed out. I then found out as per the article below that I need install OSMA. I did this and rebooted but it's still greyed out.

"If OMSA is not installed on a device that you have added in SupportAssist Enterprise with the Device Type as Server, the Clear System Event Log option is disabled."

https://www.dell.com/support/manuals/au/en/aubsd1/supportassist-enterprise-v1.0/sae10ug/clearing-the-system-event-log-sel?guid=guid-7526b295-802d-4e7d-a799-6564eff1c8d7&lang=en-us

 

I normally deal with HP servers so haven't done any of this prior, why is it so hard to run a diagnostics tool? Can any point me in the right direction to clear these logs and is that even the reason why it can't test the memory properly?

Are there any online Dell diagnostics tools that I can run?

Moderator

 • 

4.2K Posts

 • 

20.9K Points

May 13th, 2020 00:00

Hi,

 

You can clear the logs via iDRAC or in Lifecycle or in BIOS. Do you have iDRAC access? It would be good that it's being setup for your remote access to clear logs or to view hardware issues. https://dell.to/2T2zx7c

 

Could you check if the server's BIOS and LCC firmware is up to date, as there is some fixes on microcoding on processor and memory. 

 

Is OMSA installing and accessible? If yes, you can check the logs for any errors that give some leads to the issue, before clearing. 

0 events found

No Events found!

Top