I own a C6300 server, hosting two C6320p blades with the exact same Xeon Phi 7230 @ 1.30GHz config.
With only one of the C6320p blade in the host chassis, everything works fine.
But as soon as we plug a second one (even powered off), the fans of the C6300 go crazy and reach 16,800 RPM (over 17,850), causing a very relaxing plane taking off sound. 🙂
As soon as one blade is powered on, and a second one just plugged in, the problem appears.
What I tried :
- The available updates :
- iDRAC : 18.104.22.168 (on both blades)
- C6300 Firmware 3.18 (https://www.dell.com/support/home/de-ch/drivers/DriversDetails?productCode=poweredge-c6300&driverId=...)
- The "raw 0x30 0x12" ipmitool command
- Output with 1 blade :
01 60 1b 03 18 06 26 00 00 01 ff 00 01 25 2a ff
ff 0f c2 00 00 01 04 01 04 31 c5 11 ff 0f
- Output with 2 blades :
01 8c 1b 03 18 00 00 00 00 01 ff 00 01 25 2a ff
ff 0f c2 00 00 01 04 01 04 31 c5 11 ff 0f
Does someone have a mysterious raw ipmitool hexadecimal command to solve this problem?
Or even more information about what this output means?
The green values (03 18) means the firmware version is indeed the last one available.
But what about the red ones?
I precise I don't have any other weird behavior on this machine, no LED blinking, no strange ipmi bugs... Everything seems to work just fine except the fan control.
Thank you for your help!
Solved! Go to Solution.
I "solved" the problem by moving my 2 sleds to slot 2 and 4 of the machine (the bottom ones).
As soon as I have 2 machines plugged into the C6300 enclosure, and one of them is in slot 1 or 3, my fans go crazy. I don't understand why, but maybe something is not properly plugged between the upper connection card and the fan card / power splitting card?
However, I think I already tried this configuration before trying to update the firmware and BIOS of the sleds. Maybe it helped.
If you hear about someone having the same kind of problem, do not hesitate to contact me.
I won't need to add any other sleds to this enclosure, so this solution is fine for me. But I am still curious about the real origin of the problem.
Thank you all for your help.
On the fan speed running high; Does it matter which one is installed first? Does the issue follow the blade or does it not matter which one is installed first?
See if you have the current BIOS:
Dell Power Edge Server BIOS C6320P Version 2.3.0
Release date : 06 Feb 2020
Was this working normally before and just started doing this? Was anything changed before this started?
Has anything changed on the hardware like added memory?
Are there any hardware issues on either blade?
Please let me know and I'll be happy to assist you.
Than you for your answer.
Indeed, we tried to change the order we plug the blades in. Apparently, the problem only show up when we plug both blades, it does not depend on the order nor the slots used in the chassis.
The 2 BIOS are version 2.3.0.
This machine never worked properly except with only one blade plugged in.
There is indeed one change in the hardware :
We had to add a cable in each blade to connect the blades to our SSD front bay. There is no disk attached on the blades, each of them use an SSD from the chassis.
This makes the iDRAC unable to locate the disk in the "Storage" section apparently. But we can boot our OS from the blades, everything seems to work as expected.
We use standard Intel SSDs bought from DELL with the machine.
Do you think this may me related ?
Thank you again.
That hardware change might have caused this. There is unsupported hardware or near-maximum hardware configuration that can be cause fan issues. Can you try a simple step at first with doing drain flea power? because I'm seeing this RAC0501 warning at the screenshot.
While not sure, there may be an issue with the iDRAC firmware. Have you performed the iDRAC update recently or have the fans started running at maximum speed after the iDRAC update? We once solved it by rollingback to the previous version because the iDRAC update caused the fans to run at maximum speed. In generally fan issues solve after release new iDRAC&LCC FW versions for poweredge servers.
Let us know if it helps.
Nothing new after draining the flea power. It was worth a try though.
I recently did the iDRAC 22.214.171.124 update, hoping it would solve the problem. So we already had the problem with the previous version (126.96.36.199).
For the hardware change, I tried to remove both SATA cables to the front panel, so the machines don't have access to any disk and stay in the BIOS. But the problem remains. (Even after draining the flea power).
Thank you for helping 🙂
Thanks for info, and sorry for continuing problem remains. I'm digging our articles and try to find out what causes this issue. I assume you didn't upgrade any memory or change memory slot, did you? I have seen a similar situation for C6525 model. I'm asking this because such an additional memory can also cause the wrong memory population to increase the high speed fan idle conditions.
No problem, it is already great to have some help / feedback.
We did not add memory, nor change the slots in use.
Do you think it could be interesting to remove all the RAM modules to try without ?
That would be a good test to try. Leave one DIMM per processor. Check results.
You may take them down to minimum to post, see if the fans are normal and put things back a little at a time until you find the faulting component:
Minimum configuration to POST
One power supply unit
One Processor (CPU) (minimum for troubleshooting)
One Memory Module (DIMM) installed in the socket A1
Please let me know your results
I just tried, with minimum config to post. And I even changed the RAM module remaining in each blade, to make sure those are not responsible.
So, I only see 2 possibilities:
1. The fan control card does not understand something that happens when we plug 2 blades with this config, but it does not come from 1 specific blade. Only the combination triggers the problem.
2. From what I read, the fan speed is sent by the iDRAC to the fan control card. If that is true, what if the 2 iDRAC disagree on the value to set ? They have the exact same version and settings in the FAN section though.
Please check the LifeCycle log for any errors reported. You can see this log in the DRAC or in the LifeCycle Controller by choosing F10 on boot up.
Can you confirm all fans are working?
A fan failure or temperature that is too high in the system will result in a notification from the BMC and the remaining fans will ramp to 100% duty cycle.
Try swapping the fans to get a good reseat - remove fan 1, move all other fans down one and put fan 1 in the last slot.