PowerEdge Hardware General

Last reply by 07-06-2022 Unsolved
Start a Discussion
3 Silver
303

R420 seems to be randomly rebooting by itself, need help troubleshooting it

This server had an uptime of 450 days, its basically working fine since I bought it. It never had any problem since it started working. 2 days ago I upgraded its RAM, and according to my monitoring (Zabbix running in a VM inside this server) it has rebooted 3x since that. It had 4x8gb and I added more 4x8gb, now the server have 64gb of installed memory. All the memory modules are the same, model M393B1K70DH0-YK0, both the old ones and the new ones. After the installing I ran 90% of the diagnostic tests of the Life Cycle. It ran many memory tests, everything seemed fine. I rebooted the server, it booted the ESXi successfully, and I called it  a day. Then, today, Ive noticed in the monitoring that the server rebooted 3 times and I wanna start troubleshooting it, for which I need some help.

This is the information I have access right now, through the ESXi:

Gregasd_0-1656700204921.png

The original 4x8gb memory modules were installed in slots A1, A2, B1 and B2 (2 CPUs). Ive placed the new modules on the slots A3, A4, B3 and B4. The following images shows how each set of slot were before and after the memory upgrade:

 

The slots were like that before the upgradeThe slots were like that before the upgrade                 thats how it is now, after the upgradethats how it is now, after the upgrade



Today I went to the data center to check it in loco and there were 2 amber lights blinking (picture), and I didn't manage to make it stop by pressing the button in the front panel (the one at the side of the power button) or the one in the back (that was blinking at the same color and rhythm than the frontal lights).

Lights blinking when I arrived at the datacenter todayLights blinking when I arrived at the datacenter today

IDRAC isn't accessible right now because I don't remember the configurations of it. I will eventually have to stop the machine but I need to gather all the knowledge possible before to do it in order to improve the success chance of the troubleshooting. I will also reconfigure the IDRAC when I stop the server, so we will have better ways of monitoring it.

Questions:
-What can those 2 lights mean?
-Did I placed the new memory modules in a wrong way?
-What else may be causing the problem in a machine that never had problems after it started working?

Ps: Ive tried to upgrade the server's total memory when I received it. Ive bought 2 twin machines but I was going to use only one so I got all the memories from the second machine and installed it in the first machine. And I had a lot of errors trying to install and run stuffs in the server. I ended up removing the added memory and using the server with only the initial 32gb it arrived with. Back at the time, I thought Ive misplaced the memories or sc***ed up something else in the configuration... This time, in this new try (the new memory modules I installed now are NOT the same modules I tried to use in the past, the modules from the spare server), I thought I've placed the modules in the correct slots for such configuration (8x8gb) but it seems I didn't. Or I'm having a non-related problem that I'm not aware of it.

What you guys suggest me to check first?

Replies (16)
96

You can use B1 and B2 if A3 and A4 is defective. It is recommended to have same memory population between CPU's on same server. You can refer below link for memory module installation guidelines.

https://www.dell.com/support/manuals/en-us/poweredge-r420/r420ownersmanual-v2/general-memory-module-... 


Thanks,
DELL-Shine K
#IWork4Dell

95

Just to add a little more- https://dell.to/3IiLVsq


DELL-Young E
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

3 Argentum
94

Hi, It might happen that your CPU usage may be low, but the temperature may be higher.

91

In the manual there is no scenario using A1,A2,A3,A4 and B1,B2,B3,B4. Thats the problematic scenario. Is that usage not recommended by Dell?

89

Exactly @Greg asd, you need to follow the recommended memory population rules:


- DIMMs must be installed in each channel starting with the DIMM farthest from the processor. DIMMs should be installed with the largest rank count to the smallest. For example, if DR are mixed with SR DIMMs, the DRs should be placed in the lowest DIMM slots then the SR DIMMs.


- Population order is identified by the silk screen designator and the System Information Label (SIL) located on the chassis cover. The graphic below indicates the installation order for each configuration type. In dual CPU configurations, memory should be built out evenly: A1, B1, A2, B2, etc. .
Memory Optimized (Independent Channel): C1{1}, C2{1}, C1{2}, C2{2}, C1{3}, C2{3}...
Advanced ECC (Lockstep / x8 SDDC): C1{2,3}, C2{2,3}, C1{5,6}, C2{5,6}
Mirrored: C1{2,3}, C2{2,3}, C1{5,6}, C2{5,6}
Rank Sparing Population Order (Lockstep rules): C1{2,3}, C2{2,3}, C1{5,6}, C2{5,6}
Rank Sparing Population Order (Optimized): C1{1}, C2{1}, C1{2}, C2{2}, C1{3}, C2{3}...

 

Memory Population


Diego López
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
86

I have no idea about what most of it means haha, the parts with the { }.

Anyway it doesn't matter; Ive tried the scenario with A1,A2,A3/B1,B2,B3 and the server barely finishes loading the ESXi. Its resetting nonstop, and the same error stands still. I've tried to clean the slots already. Used some electric contact cleaner, with a soft cleaning brush and a blower. The problem definitively isn't dirt. So... What can I test now? And if the problem really is the B3 slot, what are my options now to increase the amount of memory installed in the system?

72

Hi, another post I found may also help:
https://dell.to/3P7uSeV


DELL-Young E
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

Latest Solutions
Top Contributor