43 Posts
0
3005
Half of C6300/C6320 Fans Don't Work
Hi folks -- having a heck of a time trying to understand what is going on with the fans in this C6300 chassis I'm trying to turn up.
SYMPTOM: The fans for the left two sleds are going full blast (assuming) and the fans on the right two slots are either not working or are barely turning. (I disconnected the left side and these right side to try to discern and the right side do seem like they are turning but barely. I could be wrong and they are just dead or turned off.
OBSERVATIONS:
- There is nothing plugged into "fan speed control" header on power distribution board.
- Using only 1 sled out of 4, moved sled to the right side but fans remain only engaged on left.
- Swapped known working fan from left side to right side to eliminate chance of a defective fan.
- It appears there have been some known bugs around fans and sled mismatching (C6320 in chassis configured for C6320P and vice-versa?) so a firmware update may rectify some problems: https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=ph8dp
TROUBLESHOOTING:
- Within fan control of iDRAC, get message "RAC0709: Unable to retrieve the fan information. Power on the server. If the server is already powered on, wait for a few minutes and refresh the page. If the problem persists, contact your service provider." (NOTE: OpenManage manual says that RAC0709 error is actually "Unknown Server Inserted Into Chassis")
- Using IPMI documentation for C6220 and C6320P (unable to find docs for C6320), attempted a few commands that failed:
- Command: ipmitool -I lanplus -H 192.168.50.14 -U root -P ***** raw 0x30 0x12
Result: Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x12 rsp=0x80): Unknown (0x80)
Source: https://www.dell.com/support/manuals/us/en/04/poweredge-c6320p/pec6320p_om_pub/checking-the-fcb-firmware-version - Command: ipmitool -I lanplus -H 192.168.50.14 -U root -P ***** raw 0x30 0xc8 0x01 0x0A 0x05 0x00 0x00 0x00
Result: Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xc8 rsp=0xd5): Command not supported in present state
Source: https://www.dell.com/support/manuals/us/en/04/poweredge-c6320p/pec6320p_om_pub/checking-the-chassis-type-sticky-bits-for-the-poweredge-c6320p-sled
- Command: ipmitool -I lanplus -H 192.168.50.14 -U root -P ***** raw 0x30 0x12
NEXT STEPS (please provide feedback, guidance, tips and suggestions... thanks!)
- Continue trying to verify/update Fan Control Board (FCB) firmware version. Here are some relevant links found on Dell site and web....
- Continue attempts to check "sticky bits" to ensure chassis is configured for C6320 sleds not C6320P
- Try additional sleds that are en route to note any different behavior or ability to better see/probe the chassis through those sleds.
- Exchange this chassis for another from the seller, however their inventory may be similarly configured chassis enclosures from the same environment.
CA_Tallguy
43 Posts
0
May 29th, 2020 12:00
UPDATE! SWAPPED TO DIFFERNET CHASSIS
Thanks for your reply, Stefan — I was nearly certain firmware update wouldn't work if I couldn't even probe to get the existing version so didn't think it was a good idea to try. I get very anxious doing firmware updates for fear of bricking something.
But interesting news! I swapped for a different chassis today and fans seem like they are WORKING great now. Still need to mess with the system to see if they are adjusting speed properly.
There are SO many settings and data points available in iDRAC that I was not seeing before. As I'm new to Dell enterprise hardware I had no idea what I should be seeing. But a BIG clue to anyone else facing these types of symptoms -- I was not seeing power supply info, fan speeds and the message "RAC0709: Unable to retrieve the fan information" as I mentioned in original post were conditions suggesting that the sled was not properly communicating with the chassis.
Once I powered up this chassis with only a single power supply inserted, I immediately got flashing power button and indicator lights that I hadn't seen on the previous chassis.
So something was definitely wrong with that previous chassis. Among the theories I have been mulling over that may have been the issue with the original chassis I had....
So here are what I think are clues that the sled(s) are not communicating with the chassis (my theory as to the issue I was facing)….
All that stuff and more now appears in my iDRAC interface. Having not been a Dell or PowerEdge C user before I didn't know how much was missing.
I still plan to update firmware and will update this thread with any info that might be helpful. I can now probe for versions using IPMI…. so I think I have 1.25 version
ipmitool -I lanplus -H 192.168.50.16 -U root -P ***** raw 0x30 0x12
01 7e 1b 01 25 00 00 00 00 01 ff 00 01 24 2e ff
ff 0f c2 00 00 01 04 01 04 31 c5 11 ff 0f
DELL-Stefan R
Moderator
Moderator
•
790 Posts
0
May 25th, 2020 07:00
Hi CA_Tallguy,
you did a very good job there. All those steps you've done are quite good. Maybe you're more a professional than I am.
You provided NEXT STEPS sound very good and you should go this way in order to see, what's going on. The FCB update is the most common one.
After you did this, please let me know if it solved this weird issue.
If the system is still under warranty, please provide me the ServiceTag in a private message.
Have a good one!
Cheers
Stefan
CA_Tallguy
43 Posts
0
May 25th, 2020 10:00
Thanks Stefan. I am not confident that the chassis is communicating with the sleds otherwise the IPMI commands should be returning data, correct? Can you confirm those are the correct IPMI commands (I took the information from the C6320p manual and assume they are the same for C6320... the C6320 manual does not cover the FCB).
I have alternate sleds to try in the chassis now and I also updated the firmware on one last night. So I'll try the IPMI again. If someone can confirm they are correct commands and any tips on when to run them since the error I got was about not able to run in the "present state" etc.
Dell-DylanJ
4 Operator
4 Operator
•
2.9K Posts
0
May 25th, 2020 11:00
I don't use and haven't supported IPMI, but I would expect some return data, even if it were just hex values. For example, if I were to issue commands using racadm (which does pretty much the same thing), you would see some returned data in the window.
One thing that may be at issue here is that fan capability has changed. What firmware revisions are we working with for the chassis and iDRAC? Linked below is a post from a 14G user that may have run into something similar. You might look and compare.
EDIT: https://www.dell.com/community/PowerEdge-Hardware-General/Dell-ENG-is-taking-away-fan-speed-control-away-from-users-iDrac/td-p/7441702
CA_Tallguy
43 Posts
0
May 25th, 2020 11:00
Thanks... can you repost the link? doesn't seem like it came through.
I am not sure about firmware in the chassis. The IPMI commands I posted were supposed to tell me version for the FCB. I'm new to IPMI and RACADM but will try more today to get any answer back from the chassis at all. For all I know there could be some wire missing or detached so one way or another my first goal is to confirm some sort of communication and then secondarily I will try to update the firmware. (I'm also very close to just swapping this chassis for another and letting someone else deal with this particular problem.)
CA_Tallguy
43 Posts
0
May 26th, 2020 22:00
Does anyone know if the fan controls need to be issued from the 4th sled? I have seen this mentioned in some discussions about the FCB on previous models such as here….
http://lists.us.dell.com/pipermail/poweredgec-tools/2013-September/000161.html
This would present a bit of a catch 22 as the fans hardly move any air on that side of the chassis so the CPU's will quickly overheat.
DELL-Stefan R
Moderator
Moderator
•
790 Posts
0
May 27th, 2020 03:00
For me, IPMI is also some rocket science
But I guess you can do it with each sled, you just decide for one I would say.
Does the behavior change if you change the values to, let's say 50% (hex(int(50)) 0x32) for each fan?
ipmitool -I lan -H -P -U root raw 0x32 0x32 0x32 0x32
CA_Tallguy
43 Posts
0
May 27th, 2020 09:00
Yes the IPMI is also feeling like a bit of rocket science for me too, Stefan LOL.
I don't think that your format is correct. The "raw" command is not just the fans and has to be in a format:
ipmitool –I lanplus -H ipaddress -U username -P password raw netfn cmd data
In this format, the netfn element is the network function, which identifies the functional message class and clusters IPMI commands into different sets. The cmd element represents a unique one-byte command value within a given network function. Finally, the data element provides additional parameters for a request or response, if any. (source: https://www.dell.com/downloads/global/power/ps4q07-20070387-Babu.pdf )
NOTE: I am color coding everything as: netfn cmd data and response
In my listed commands below and in original post, I appear to be issuing commands 0x12 and 0xc8 to "netfn" = 0x30 ( and to repeat above documentation, "network function ... identifies the functional message class and clusters IPMI commands into different sets" ). IPMI is doing a nice job of repeating back and parsing out how it interpreted original command and then what the response code is for the operation…. (0x80 = "Unknown" and 0xd5 = "Command not supported in present state").
$ ipmitool -I lanplus -H 192.168.50.16 -U root -P ***** raw 0x30 0x12
Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0x12 rsp=0x80 Unknown (0x80)
$ ipmitool -I lanplus -H 192.168.50.16 -U root -P ***** raw 0x30 0xc8 0x01 0x0A 0x05 0x00 0x00 0x00
Unable to send RAW command (channel=0x0 netfn=0x30 lun=0x0 cmd=0xc8 rsp=0xd5 Command not supported in present state (0xd5)
I have not been able to see fan speeds when they are in operation anywhere at all, not just missing in IPMI. Above commands and as shown in my original post on this thread do not seem to do anything useful. I will try again today to elicit some useful info, even if not on the fans.
I am not sure if 0x30 is the correct "netfn" for this system. Any idea where I can verify this? The commands I am issuing are not giving me anything useful or expected, as shown. Or is there a way to probe for the system to tell me what the valid/active "netfn" codes are for my configuration?
I suspect it could be different for PowerEdge C platform vs regular PowerEdge as the fan control/power distribution board is shared between the 4 different compute sleds/motherboards whereas on a normal PowerEdge server, there is only one chassis and fan/cooling setup supporting a single "compute sled" or motherboard.
On the C6320P manual, they do have a section on checking "sticky bits" and that is where I am getting the 0x30 netfn…… https://www.dell.com/support/manuals/us/en/04/poweredge-c6320p/pec6320p_om_pub/checking-the-chassis-type-sticky-bits-for-the-poweredge-c6320p-sled?guid=guid-2ff1b289-3d6f-4f29-8b4c-89c2227ad408&lang=en-us
I assume that the "netfn" is the same for the C6320 (no p) and C6320p sleds as both are using the C6300 chassis. The S C6320p is just one processor not two.
The really confusing part of this setup is knowing if these IPMI commands issues to the various sleds are making it back to the actual chassis that holds four sleds. As per my previous post 05-26-2020 10:33 PM — I have seen indications in places that on previous generations the commands had to be issued from the fourth sled ONLY and that if issued from sleds 1 to 3 they may not work. There is nothing mentioned about this that I can see in the C6320P manual. And as I've said there is NOTHING about any of this in the C6320 manual. So I feel like I"m playing with minimally documented feature.
If anyone has more info on all this I would be very grateful. Especially if anyone can verify that I should be issuing commands to 0x30 and if I have to use ONLY the fourth sled. As I've said, if I have to use 4th sled that is going to be tricky because the cooling is not coming up as it should. I likely will have to rig up some temporary cooling for the CPU's as the whole point of this thread is to get the fans to work properly over on that side.
CA_Tallguy
43 Posts
0
May 27th, 2020 13:00
Sure seems like an awful lot of data is missing.... nothing for fans, PSU's, drives (all in the main chassis) but also memory, CPU, PCI on the sled (this sled has PCIe LSI raid card).
$ ipmitool -I lanplus -H 192.168.50.16 -U root -P ****** sdr list
SEL | Not Readable | ns
Temp | 70 degrees C | ok
Temp | 62 degrees C | ok
OS Watchdog | 0x00 | ok
VCORE PG | 0x00 | ok
VCORE PG | 0x00 | ok
3.3V PG | 0x00 | ok
5V PG | 0x00 | ok
Dedicated NIC | 0x00 | ok
Presence | 0x00 | ok
Presence | 0x00 | ok
PLL PG | 0x00 | ok
PLL PG | 0x00 | ok
1.1V PG | 0x00 | ok
M23 VDDQ PG | 0x00 | ok
M23 VTT PG | 0x00 | ok
FETDRV PG | 0x00 | ok
VSA PG | 0x00 | ok
VSA PG | 0x00 | ok
M01 VDDQ PG | 0x00 | ok
M01 VDDQ PG | 0x00 | ok
M23 VTT PG | 0x00 | ok
M01 VTT PG | 0x00 | ok
VTT PG | 0x00 | ok
VTT PG | 0x00 | ok
M23 VDDQ PG | 0x00 | ok
Status | 0x00 | ok
CPU Throttle | Not Readable | ns
Status | 0x00 | ok
CPU Throttle | Not Readable | ns
1.5V PG | 0x00 | ok
M01 VTT PG | 0x00 | ok
PCIe Slot1 | Not Readable | ns
PCIe Slot2 | Not Readable | ns
PCIe Slot3 | Not Readable | ns
PCIe Slot4 | Not Readable | ns
PCIe Slot5 | Not Readable | ns
PCIe Slot6 | Not Readable | ns
PCIe Slot7 | Not Readable | ns
A | 0x00 | ok
B | 0x00 | ok
CMOS Battery | 0x00 | ok
Pwr Consumption | 64 Watts | ok
Power Optimized | Not Readable | ns
ECC Corr Err | Not Readable | ns
ECC Uncorr Err | Not Readable | ns
I/O Channel Chk | Not Readable | ns
PCI Parity Err | Not Readable | ns
PCI System Err | Not Readable | ns
SBE Log Disabled | Not Readable | ns
Logging Disabled | Not Readable | ns
Unknown | Not Readable | ns
CPU Protocol Err | Not Readable | ns
CPU Bus PERR | Not Readable | ns
CPU Init Err | Not Readable | ns
CPU Machine Chk | Not Readable | ns
Memory Spared | Not Readable | ns
Memory Mirrored | Not Readable | ns
Memory RAID | Not Readable | ns
Memory Added | Not Readable | ns
Memory Removed | Not Readable | ns
Memory Cfg Err | Not Readable | ns
Mem Redun Gain | Not Readable | ns
PCIE Fatal Err | Not Readable | ns
Chipset Err | Not Readable | ns
Err Reg Pointer | Not Readable | ns
Mem ECC Warning | Not Readable | ns
Mem CRC Err | Not Readable | ns
USB Over-current | Not Readable | ns
POST Err | Not Readable | ns
Hdwr version err | Not Readable | ns
Mem Overtemp | Not Readable | ns
Mem Fatal SB CRC | Not Readable | ns
Mem Fatal NB CRC | Not Readable | ns
OS Watchdog Time | Not Readable | ns
Non Fatal PCI Er | Not Readable | ns
Fatal IO Error | Not Readable | ns
MSR Info Log | Not Readable | ns
TXT Status | Not Readable | ns
PFault Fail Safe | Not Readable | ns
FAN_1 | disabled | ns
FAN_2 | disabled | ns
FAN_3 | disabled | ns
FAN_4 | disabled | ns
FAN_5 | disabled | ns
FAN_6 | disabled | ns
FAN_7 | disabled | ns
FAN_8 | disabled | ns
FAN_9 | disabled | ns
FAN_A | disabled | ns
FAN_B | disabled | ns
FAN_C | disabled | ns
FAN_D | disabled | ns
Inlet Temp | disabled | ns
Exhaust Temp | disabled | ns
Input Current | disabled | ns
Input Voltage | disabled | ns
SC FW Status | Not Readable | ns
HDD 8 Status | Not Readable | ns
HDD 9 Status | Not Readable | ns
HDD 10 Status | Not Readable | ns
HDD 11 Status | Not Readable | ns
HDD 12 Status | Not Readable | ns
HDD 13 Status | Not Readable | ns
HDD 14 Status | Not Readable | ns
HDD 15 Status | Not Readable | ns
PSU 1 Status | Not Readable | ns
PSU 2 Status | Not Readable | ns
PSU 3 Status | Not Readable | ns
PSU 4 Status | Not Readable | ns
PSU 5 Status | Not Readable | ns
PSU 6 Status | Not Readable | ns
PSU 7 Status | Not Readable | ns
PSU 8 Status | Not Readable | ns
PSU Mismatch | Not Readable | ns
PSU Redundancy | Not Readable | ns
FW Update Status | Not Readable | ns
FAN_E | disabled | ns
FAN_F | disabled | ns
PSU 9 Status | Not Readable | ns
PSU 10 Status | Not Readable | ns
PSU 11 Status | Not Readable | ns
PSU 12 Status | Not Readable | ns
PSU 13 Status | Not Readable | ns
PSU 14 Status | Not Readable | ns
PSU 15 Status | Not Readable | ns
PSU 16 Status | Not Readable | ns
PSU 17 Status | Not Readable | ns
PSU 18 Status | Not Readable | ns
PSU 19 Status | Not Readable | ns
PSU 20 Status | Not Readable | ns
HDD 1 Status | Not Readable | ns
HDD 2 Status | Not Readable | ns
HDD 3 Status | Not Readable | ns
HDD 4 Status | Not Readable | ns
HDD 5 Status | Not Readable | ns
HDD 6 Status | Not Readable | ns
HDD 7 Status | Not Readable | ns
HDD 16 Status | Not Readable | ns
HDD 17 Status | Not Readable | ns
HDD 18 Status | Not Readable | ns
HDD 19 Status | Not Readable | ns
HDD 20 Status | Not Readable | ns
HDD 21 Status | Not Readable | ns
HDD 22 Status | Not Readable | ns
HDD 23 Status | Not Readable | ns
HDD 24 Status | Not Readable | ns
CPU Usage | 0 percent | ok
IO Usage | 0 percent | ok
MEM Usage | 0 percent | ok
SYS Usage | 0 percent | ok
Mezz Presence | 0x00 | ok
PSU 21 Status | Not Readable | ns
PSU 22 Status | Not Readable | ns
PSU 23 Status | Not Readable | ns
PSU 24 Status | Not Readable | ns
PSU 25 Status | Not Readable | ns
PSU 26 Status | Not Readable | ns
PSU 27 Status | Not Readable | ns
PSU 28 Status | Not Readable | ns
PSU 29 Status | Not Readable | ns
PSU 30 Status | Not Readable | ns
PSU 31 Status | Not Readable | ns
FCB Button | Not Readable | ns
DELL-Stefan R
Moderator
Moderator
•
790 Posts
0
May 28th, 2020 02:00
Hey,
I wonder why it shows Not Readable.
I found an interesting article online (not a Del article), where someone did change the fan speed successfully but it was not a C PowerEdge it was a R330. But maybe you can try, if not already done, the steps he provided there:
https://dell.to/2ZKyWv2
Just to add, I guess you don't need it, but here is a white paper about the IPMI on PowerEdge servers:
https://dell.to/2TMMyST
What I also found is this process to update the FCB and Fan tables.
To verify the current Fan Controller Board firmware, run the following ipmi command on the system:
ipmitool raw 0x30 0x12
The output should looks something like this (the highlighted is the current FCB firmware version):
C6300 Fan Controller Board Update 2.12
Newest racadm 8.5- RHEL
Newest racadm 8.5- Windows (32-bit)
Newest racadm 8.5- Windows (64-bit)
Instructions:
FC0210.bin to https://dell.to/2XCjxdB
FT0602.bin to https://dell.to/2XB0gcK
***NOTE**** IF YOU GET AN ERROR THAT THE FILE TYPE IS INVALID, UPDATE TO RACADM 8.5
For FCB FW:
C:\WINDOWS\system32>racadm -r <ip.address.of.idrac> -u root -p calvin update -f c:/path/to/FC0210.sc
For the local update, run
# racadm update –f /path/to/FC0210.sc
C:\WINDOWS\system32>racadm -r <ip.address.of.idrac> -u root -p calvin update -f c:/path/to/FT0602.sc
To verify the new Fan Controller Board firmware, run the following ipmi command on the system:
ipmitool raw 0x30 0x12
The output should looks something like this (the highlighted will show the current FCB firmware version):
CA_Tallguy
43 Posts
0
May 28th, 2020 08:00
Thanks Stefan - The IPMI command ipmitool raw 0x30 0x12 you discuss is what I have been trying but I get response "Unknown (0x80)" for example (see original post and others for exact commands I used and responses).
That article showing process on R330 is interesting and it is helpful to see how someone is messing around with the fans even on another platform. The difference in that case is that the fans are visible and reporting speeds through iDRAC, and the system is responsive to his IPMI commands. In my case, they are not even showing speeds in iDRAC and checking firmware in IPMI does not work for me.
Maybe my problem is as simple as a missing or loose cable connection? Or maybe the FCB is bad? The fans do turn on and off with the system power so that doesn't seem likely. I can't see anything that seems out of place except that there is nothing connected to "fan speed control" header (but maybe that is just for programming?)
But since FCB is shared by four different sleds this is expected to be more tricky than if the FCB was on a dedicated motherboard. I am not seeing power supply info either so that makes me think the problem may be how I am communicating with the base chassis. I wonder if 0x30 is the correct code for what I need to do or if it could be some other identifier for my system.
So for troubleshooting, I think that is the best place to start… to somehow verify that I can talk to the base chassis, even another component. I have moved sled around to different slots in case only one can talk to the base chassis and that hasn't seemed to help. I've been probing IPMI and RACADM trying to get some data from base chassis but have not been able to determine if I am seeing any. Any thoughts on some other system that I should be able to probe, like the power supplies, to verify that I am able to talk to the chassis?
I have not attempted any update because I just assume it won't work if it isn't responding to the IPMI commands to check firmware versions. I'd much prefer to see a response to those commands before attempting update anyway.
DELL-Stefan R
Moderator
Moderator
•
790 Posts
0
May 29th, 2020 01:00
Oh well, I would go for the update anyways.
As the latest update brings the SEL for thermal and power events in iDRAC it may also add the function to see the fan speeds.
C6320 - PowerEdge C6300 ENCLOSURE FCB FW Update
https://dell.to/3dfLyhL
Worth a shot, don't you think?
Cheers,
Stefan
CA_Tallguy
43 Posts
0
May 30th, 2020 12:00
Couple notes/thoughts/questions on firmware upgrade which is currently underway on my system....
FROM C6320P MANUAL
Check the fan control board (FCB) firmware version by executing this IPMI command
ipmitool –U server_user_name –P server_user_pass_word –H server_IP –I lanplus raw 0x30 0x12
. For example, to check the FCB firmware version v3.09, execute this command:
ipmitool –U root –P calvin –H 10.3.25.127 –I lanplus raw 0x30 0x12
Response: 01 69 1b 03 09 06 26 00 00 04 ff 00 01 2a 2f ff ff 0f c2 00 00 01 04 01 04 31 c5 11 ff 0f
NOTE:
CA_Tallguy
43 Posts
0
June 1st, 2020 13:00
Quick update.... still trying to get the firmware updated now that I have a chassis that is communicating with the sleds. I started a new thread about the FCB firmware update....
https://www.dell.com/community/Systems-Management-General/C6300-C6320-Fan-Control-FW-quot-Success-quot-But-No-Version/m-p/7614728#M29412
One note.... with the new chassis, I have had two sleds inserted on the side opposite to the power supplies. (I am not sure if that side is slots 1/2 or 3/4). When I added a sled to the bottom slot in the row next to power supplies, the fans suddenly (finally) went down below 16K RPM to 13/14000 RPM. Further, checking FCB firmware version, I now see a "fan table" version number!! Hooray!
Fan Table version shown below in RED and I think sled number is in GREEN
Now if I can just get the damn firmware to update!!
**** FURTHER UPDATES ABOUT FIRMWARE WILL BE AT LINK ABOVE ****