Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

109581

June 14th, 2014 11:00

R420 chassis heating due raid controller card

hi

we purchased 6 servers from dell india.

we are facing a severe heating issue on top of the chassis above the raid card with all these 6 servers.

server configuration

R420 - single hexcore processor 2.2 ghz, 32gb ram

4 X 600 gb 15000 rpm drives
(configured as 2 drives in raid 1 + 2 drives in raid 1)

these servers are placed in a datacenter -- temperature is around 18 deg C

we are facing a severe heating issue at the on top of the chasiss just above the raid controller

when the servers are placed one above the other like they are normally placed in a datacenter the servers heat up between 50.4 - 57 deg celcius

dell support has changed motherboard, raid card , power supply etc of one of the above servers which is non-production (heating to 50.4 deg celcius) but the issue is not resolved

On closer examination of the raid controller card i noted that the fins of the heatsink on top of the raid controller card --  instead of being parallel to the airflow is actually perpendicular to the air flow and is actually trapping all the heat

the same card in a R620 server has the heatsink fins parallel to the air flow which remains cool

has anybody else in this forum noted this and are facing such a heating issue ?

could you please check this and let me know the resolution to this problem

can the heatsink be turned around by 90 degrees so that the fins of the heat sink are parallel to the airflow ?

thanks for your help

rajesh mahadevan

mumbai , india

18 Posts

July 17th, 2014 20:00

prolame

could you pl post the detailed configuration of your R420 server ?

thanks,

rajesh

6 Posts

July 18th, 2014 01:00

Hello,

Yes i send it today but later.

18 Posts

July 18th, 2014 03:00

prolame


thanks for your update.

could you please let me know the following

1) are your server in a rack one above the other ?

2) what is the temperature of the server ?

thanks

rajesh

6 Posts

July 18th, 2014 03:00

server are in full 42U RACK.

Temperatures:

cc-hyperv
SEL              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Intrusion        | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
Fan1A RPM        | 1680.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan1B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan2A RPM        | 5400.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan2B RPM        | 5040.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan3A RPM        | 5280.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan3B RPM        | 5040.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan4A RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan4B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan5A RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Fan5B RPM        | 1560.000   | RPM        | ok    | na        | 720.000   | 840.000   | na        | na        | na        
Inlet Temp       | 23.000     | degrees C  | ok    | na        | -7.000    | 3.000     | 42.000    | 47.000    | na        
OS Watchdog      | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
VCORE PG         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
3.3V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
5V PG            | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
USB Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Dedicated NIC    | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VGA Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PLL PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
1.1V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
BP1 5V PG        | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VSA PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
MEM VDDQ PG      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
LCD Cable Pres   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
VTT PG           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0280| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x8080| na        | na        | na        | na        | na        | na        
Fan Redundancy   | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Riser Config Err | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
1.5V PG          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PS2 PG Fail      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PS1 PG Fail      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
MEM VTT PG       | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PCIe Slot1       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot2       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot3       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIe Slot4       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
A                | 0x0        | discrete   | 0x4080| na        | na        | na        | na        | na        | na        
vFlash           | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
CMOS Battery     | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
Presence         | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Current 1        | 0.600      | Amps       | ok    | na        | na        | na        | na        | na        | na        
Current 2        | 0.000      | Amps       | ok    | na        | na        | na        | na        | na        | na        
Voltage 1        | 230.000    | Volts      | ok    | na        | na        | na        | na        | na        | na        
Voltage 2        | 230.000    | Volts      | ok    | na        | na        | na        | na        | na        | na        
PS Redundancy    | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Status           | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Pwr Consumption  | 126.000    | Watts      | ok    | na        | na        | na        | 420.000   | 462.000   | na        
Power Optimized  | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
SD1              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
SD2              | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Redundancy       | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ECC Corr Err     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ECC Uncorr Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
I/O Channel Chk  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCI Parity Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCI System Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
SBE Log Disabled | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Logging Disabled | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Unknown          | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Protocol Err | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Bus PERR     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Init Err     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
CPU Machine Chk  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Spared    | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Mirrored  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory RAID      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Added     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Removed   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Memory Cfg Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Redun Gain   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
PCIE Fatal Err   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Chipset Err      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Err Reg Pointer  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem ECC Warning  | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem CRC Err      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
USB Over-current | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
POST Err         | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Hdwr version err | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Overtemp     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Fatal SB CRC | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Mem Fatal NB CRC | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
OS Watchdog Time | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Non Fatal PCI Er | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Fatal IO Error   | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
MSR Info Log     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Drive 0          | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Cable SAS A      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Cable SAS B      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Cable SAS C      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Cable SAS D      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Power Cable      | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Signal Cable     | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
PFault Fail Safe | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
ROMB Battery     | 0x0        | discrete   | 0x0080| na        | na        | na        | na        | na        | na        
ROMB Battery     | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
Riser 1 Presence | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Riser 2 Presence | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na        
Temp             | 66.000     | degrees C  | ok    | na        | 3.000     | 8.000     | 92.000    | 97.000    | na


I cant paste temperatures in servers where we put out the RAISER. But have some info about battery on controller, battery works to 65'C and there are warning to usage above that.

http://i59.tinypic.com/2620h1v.jpg

6 Posts

July 18th, 2014 03:00

We have 18 R320 servers online now:

One from configuration:

E5-2420 , 32 GB RAM, 4x 300GB SAS, H710 - thats example, this server have high temp like other, in all new servers R320 and R420 we put out a RAISER, the temp. on RAID ctrl. are 3-6'C lower.

Dell says the controler can work in high temp, i say yes but battery on this controler cant. We change our batteries many times. 

18 Posts

July 18th, 2014 04:00

prolame

Thanks for the info.

Are these details for a r420 server ?

have you by any chance measured the temperature of the chassis ie external surface of the server over the raid card, using a contact based thermometer ?

rajesh

1 Message

April 15th, 2015 16:00

Hello,

I have the same problem with R420 and R620 ( 2x  X540).

On the same slot for each type of server the X540 is disabled due to heat problem.

It's not for one server but on more than 30!

Why only the same slot if it's not a design issue?

Regards,

548 Posts

October 13th, 2016 01:00

Not wanting to dig up something long dead but I was googling an issue on overheating servers and found this thread.

However, reading through the tread and the OP's posts, i find it odd to think OP would actually mark Daniel's 15June2015 answer as a "verified answer" considering the he makes the statement "i am absolutely disappointed with dell" on 15 Jul 2015 and other concerns are raised about shortened lifespan of raid batteries and fans [:?]

So the question is, can/does anyone other than the OP accept an answer and mark it as "verified answer"?

As an FYI, there are workplace safety criteria for the surface temperature of items that can be touched with finger or palm (before burns can occur). IIRC, in my juristiction, for metalic surfaces, 50C is considered a hold temperature while 60C is considered a brief contact temperature and 70C is considered a 60 second contact before injury can occur. So on that basis, 56C is likley an outcome of poor thermal design if such a case temperature is unintended or a thermal design that likely needs warning lables on the chassis to indicate a hazard "hot spot" exists if it was intended. In either case there seems to be a bit of a fail on Dells part w.r.t. this issue.

5 Practitioner

 • 

274.2K Posts

May 24th, 2018 21:00

dfd

61 Posts

March 26th, 2021 19:00

I hope anyone is still listening to this thread...


I have this R420 and bought the H710 mini, and Im having problems since I installed 2 860 EVO SSDs on it. The system randomly reboots while installing an OS in this RAID (I can build a R1, a single disk R0, it doesnt matter) or after a couple of minutes after it boots (when the OS is successfully installed and you boot the server by it) the server suddenly reboots. No warnings, no nothing. Image on the monitor disappear, machine stops, and after a few seconds it starts its boot processes.

I noticed the heatsink of the H710 way hotter than everything else in the server (im working with it open) but the batery temperature seems fine according to the tests in LifeCycle Controller and IDRAC. I gues it was something around 37-37C. The LifeCycle and IDRAC says it is ok. The H710 Bios says this, and I have no idea on what it means, where that temp is from or if its good or bad:

ROC Temperatur: 79C

IOC Temperature: 79C

 

It looks high when compared with the CPUs 40C and the Inlet's 25C.

The weirdest thing is, anyway, this: this machine was running fine by 130 days nonstoped with a pair of 120gb Sandisks SSDs. I replaced them by the 1tb 860 EVOs in order to get extra storage, and since them... headcaches. I spent the whole night of yesterday trying to understand it with no success. Today when I came back to the datacenter, the machine was turned off the whole day... it booted perfectly fine and was working fine for a couple of hours, before restarting with the random reboots. very Strange. Sadly it seems theres no log at all for me to analyze, or I dont know where to find them. It also happened when I tried to install W10 on this raid. At first it took half an hour to transfer 10% of the files and I aborted the installation, in the second try, with a recreated raid (i tried to disable everything cache related to see if the problems stops, but no cigar) it just rebooted during the installation.

 

Now, today, Im set to try and update the PERC's firmware (didnt manage to make that through LifeCycle online update, I managed to update every obsolete firmware from every controller in this server, BUT the PERC's one - gotta try to update it by other means) but before that I will place the Sandisks back into it and see if it will stop randomly rebooting even with the current firmware.

 

Im getting mad with this thing. Once I have no access to validated disks here where I live, I dont even know what to do anymore.

any help is welcome.

Moderator

 • 

3.1K Posts

March 28th, 2021 19:00

Hello,

 

My thoughts on your issue were to request you to update the server's firmware and check if the issue persist. May I know if the drives you have there are enterprise grade drives? You will need to use only enterprise grade drives on servers. 

61 Posts

May 31st, 2022 01:00

Believe it or not, the problems seemed to be memory related. After I removed 32gb I have added, leaving the machine with only 32gb (it arrived with 32gb, I added 32gb making it 64gb, then I removed these added 32gb), the problem vanished entirely...

Moderator

 • 

2.1K Posts

May 31st, 2022 02:00

Hi, thanks for your feedback and for updating the thread. It will help everyone.

38 Posts

December 3rd, 2022 02:00

R420 is a very failed product, the PERC burned me

No Events found!

Top