PowerEdge Hardware General

Last reply by 05-31-2022 Solved
Start a Discussion
2 Bronze
2 Bronze
108538

R420 chassis heating due raid controller card

hi

we purchased 6 servers from dell india.

we are facing a severe heating issue on top of the chassis above the raid card with all these 6 servers.

server configuration

R420 - single hexcore processor 2.2 ghz, 32gb ram

4 X 600 gb 15000 rpm drives
(configured as 2 drives in raid 1 + 2 drives in raid 1)

these servers are placed in a datacenter -- temperature is around 18 deg C

we are facing a severe heating issue at the on top of the chasiss just above the raid controller

when the servers are placed one above the other like they are normally placed in a datacenter the servers heat up between 50.4 - 57 deg celcius

dell support has changed motherboard, raid card , power supply etc of one of the above servers which is non-production (heating to 50.4 deg celcius) but the issue is not resolved

On closer examination of the raid controller card i noted that the fins of the heatsink on top of the raid controller card --  instead of being parallel to the airflow is actually perpendicular to the air flow and is actually trapping all the heat

the same card in a R620 server has the heatsink fins parallel to the air flow which remains cool

has anybody else in this forum noted this and are facing such a heating issue ?

could you please check this and let me know the resolution to this problem

can the heatsink be turned around by 90 degrees so that the fins of the heat sink are parallel to the airflow ?

thanks for your help

rajesh mahadevan

mumbai , india

Replies (28)
2 Bronze
2 Bronze
1005

server are in full 42U RACK.

Temperatures:

cc-hyperv
SEL              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Intrusion        | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
Fan1A RPM        | 1680.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan1B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan2A RPM        | 5400.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan2B RPM        | 5040.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan3A RPM        | 5280.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan3B RPM        | 5040.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan4A RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan4B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan5A RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan5B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Inlet Temp       | 23.000     | degrees C  | ok    | na        | -7.000    | 3.000     | 42.000    | 47.000    | na        
OS Watchdog      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
VCORE PG         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
3.3V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
5V PG            | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
USB Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Dedicated NIC    | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VGA Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PLL PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
1.1V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
BP1 5V PG        | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VSA PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
MEM VDDQ PG      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
LCD Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VTT PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0280| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x8080| na        | na        | na        | na        | na        | na        
Fan Redundancy   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Riser Config Err | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
1.5V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PS2 PG Fail      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PS1 PG Fail      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
MEM VTT PG       | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PCIe Slot1       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot2       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot3       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot4       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
A                | 0x0        | discrete   | 0x4080| na        | na        | na        | na        | na        | na        
vFlash           | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
CMOS Battery     | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Current 1        | 0.600      | Amps       | ok    | na        | na        | na        | na        | na        | na        
Current 2        | 0.000      | Amps       | ok    | na        | na        | na        | na        | na        | na        
Voltage 1        | 230.000    | Volts      | ok    | na        | na        | na        | na        | na        | na        
Voltage 2        | 230.000    | Volts      | ok    | na        | na        | na        | na        | na        | na        
PS Redundancy    | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Pwr Consumption  | 126.000    | Watts      | ok    | na        | na        | na        | 420.000   | 462.000   | na        
Power Optimized  | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
SD1              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
SD2              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Redundancy       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ECC Corr Err     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ECC Uncorr Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
I/O Channel Chk  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCI Parity Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCI System Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
SBE Log Disabled | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Logging Disabled | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Unknown          | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Protocol Err | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Bus PERR     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Init Err     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Machine Chk  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Spared    | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Mirrored  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory RAID      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Added     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Removed   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Cfg Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Redun Gain   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIE Fatal Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Chipset Err      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Err Reg Pointer  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem ECC Warning  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem CRC Err      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
USB Over-current | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
POST Err         | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Hdwr version err | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Overtemp     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Fatal SB CRC | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Fatal NB CRC | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
OS Watchdog Time | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Non Fatal PCI Er | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Fatal IO Error   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
MSR Info Log     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Drive 0          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Cable SAS A      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Cable SAS B      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Cable SAS C      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Cable SAS D      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Power Cable      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Signal Cable     | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PFault Fail Safe | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ROMB Battery     | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
ROMB Battery     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Riser 1 Presence | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Riser 2 Presence | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Temp             | 66.000     | degrees C  | ok    | na        | 3.000     | 8.000     | 92.000    | 97.000    | na


I cant paste temperatures in servers where we put out the RAISER. But have some info about battery on controller, battery works to 65'C and there are warning to usage above that.

http://i59.tinypic.com/2620h1v.jpg
2 Bronze
2 Bronze
1005

prolame

Thanks for the info.

Are these details for a r420 server ?

have you by any chance measured the temperature of the chassis ie external surface of the server over the raid card, using a contact based thermometer ?

rajesh

2 Bronze
2 Bronze
1005

Hello,

I have the same problem with R420 and R620 ( 2x  X540).

On the same slot for each type of server the X540 is disabled due to heat problem.

It's not for one server but on more than 30!

Why only the same slot if it's not a design issue?

Regards,

3 Zinc
1005

Not wanting to dig up something long dead but I was googling an issue on overheating servers and found this thread.

However, reading through the tread and the OP's posts, i find it odd to think OP would actually mark Daniel's 15June2015 answer as a "verified answer" considering the he makes the statement "i am absolutely disappointed with dell" on 15 Jul 2015 and other concerns are raised about shortened lifespan of raid batteries and fans [:?]

So the question is, can/does anyone other than the OP accept an answer and mark it as "verified answer"?

As an FYI, there are workplace safety criteria for the surface temperature of items that can be touched with finger or palm (before burns can occur). IIRC, in my juristiction, for metalic surfaces, 50C is considered a hold temperature while 60C is considered a brief contact temperature and 70C is considered a 60 second contact before injury can occur. So on that basis, 56C is likley an outcome of poor thermal design if such a case temperature is unintended or a thermal design that likely needs warning lables on the chassis to indicate a hazard "hot spot" exists if it was intended. In either case there seems to be a bit of a fail on Dells part w.r.t. this issue.

1020

dfd
455

I hope anyone is still listening to this thread...


I have this R420 and bought the H710 mini, and Im having problems since I installed 2 860 EVO SSDs on it. The system randomly reboots while installing an OS in this RAID (I can build a R1, a single disk R0, it doesnt matter) or after a couple of minutes after it boots (when the OS is successfully installed and you boot the server by it) the server suddenly reboots. No warnings, no nothing. Image on the monitor disappear, machine stops, and after a few seconds it starts its boot processes.

I noticed the heatsink of the H710 way hotter than everything else in the server (im working with it open) but the batery temperature seems fine according to the tests in LifeCycle Controller and IDRAC. I gues it was something around 37-37C. The LifeCycle and IDRAC says it is ok. The H710 Bios says this, and I have no idea on what it means, where that temp is from or if its good or bad:

ROC Temperatur: 79C

IOC Temperature: 79C

 

It looks high when compared with the CPUs 40C and the Inlet's 25C.

The weirdest thing is, anyway, this: this machine was running fine by 130 days nonstoped with a pair of 120gb Sandisks SSDs. I replaced them by the 1tb 860 EVOs in order to get extra storage, and since them... headcaches. I spent the whole night of yesterday trying to understand it with no success. Today when I came back to the datacenter, the machine was turned off the whole day... it booted perfectly fine and was working fine for a couple of hours, before restarting with the random reboots. very Strange. Sadly it seems theres no log at all for me to analyze, or I dont know where to find them. It also happened when I tried to install W10 on this raid. At first it took half an hour to transfer 10% of the files and I aborted the installation, in the second try, with a recreated raid (i tried to disable everything cache related to see if the problems stops, but no cigar) it just rebooted during the installation.

 

Now, today, Im set to try and update the PERC's firmware (didnt manage to make that through LifeCycle online update, I managed to update every obsolete firmware from every controller in this server, BUT the PERC's one - gotta try to update it by other means) but before that I will place the Sandisks back into it and see if it will stop randomly rebooting even with the current firmware.

 

Im getting mad with this thing. Once I have no access to validated disks here where I live, I dont even know what to do anymore.

any help is welcome.

365

Hello,

 

My thoughts on your issue were to request you to update the server's firmware and check if the issue persist. May I know if the drives you have there are enterprise grade drives? You will need to use only enterprise grade drives on servers. 


DELL-Joey C
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

152

Believe it or not, the problems seemed to be memory related. After I removed 32gb I have added, leaving the machine with only 32gb (it arrived with 32gb, I added 32gb making it 64gb, then I removed these added 32gb), the problem vanished entirely...

149

Hi, thanks for your feedback and for updating the thread. It will help everyone.


Thanks,

Erman Özkurt
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
Latest Solutions
Top Contributor