TaylorData
1 Copper

R710 PSU Redundancy Failure

We have had a few servers fail to stay up and running when just one of the PSUs goes down. We brought one of the problematic PSUs back to our test environment and were able to reproduce the issue for a time being. But the PSU is now functioning correctly. Which leads me to believe there is some sort of firmware issue, as an actually faulty PSU should not magically fix itself.

  • Even after the OS would crash the iDRAC would only report the PSU with removed AC as having a problem.
  • The PSU failing to stay redundant was at least providing enough power to keep the iDRAC running, but not enough for the OS.
  • While up the PSUs were sharing the power load evenly.

I know that cold redundancy was a fairly new feature at the time of the R710s release, maybe the PSU thinks its in a cold redundancy state and never ups the power its providing? Is there any way to check the state of the PSU's Cold/Warm Redundancy setting? I know other servers have this available by raw ipmitool commands, but those are undocumented for the public so I don't know how to access them on Dell R710s.

0 Kudos
4 Replies

RE: R710 PSU Redundancy Failure

Hello.

Once the server PCIe slots and hard drive bays are fully populated, then a single power supply(570 W) may not be able to power the system and hence lost redundancy. Redundancy may also be lost depending on the power supply and its firmware. Provide the part number for the power supply that you are using.

Robert Alakara

Dell EMC | Enterprise Services

0 Kudos
TaylorData
1 Copper

RE: R710 PSU Redundancy Failure

We have the 870W PSUs, as we do fully load the PCIe slots. Here is additional info:

Model: N870-50

RefNO:NPS-885ABA

Rev: 08

I did not record if they had a different firmware before. I am getting new power supplies that showed the same symptoms from our site in the next week, so I can check what they have immediately after plugging them in.

I'm not certain how to create code brackets/quotes, so I'll just do a different font for now:

linux:~ # omreport chassis pwrsupplies
Power Supplies Information

Power Supply Redundancy
Redundancy Status : Full

Individual Power Supply Elements
Index : 0
Status : Ok
Location : PS 1 Status
Type : AC
Rated Input Wattage : 1080 W
Maximum Output Wattage : 870 W
Firmware Version : 03.02.67
Online Status : Presence Detected
Power Monitoring Capable : Yes

Index : 1
Status : Ok
Location : PS 2 Status
Type : AC
Rated Input Wattage : 1080 W
Maximum Output Wattage : 870 W
Firmware Version : 03.02.67
Online Status : Presence Detected
Power Monitoring Capable : Yes

 

 

0 Kudos

RE: R710 PSU Redundancy Failure

I haven't been able to locate any firmware updates and release notes for this particular model. There could be a communication issue with the PSU firmware. Ensure that you have up to date iDRAC and BIOS. Otherwise, you may also try testing with 1100 W PSU. 

Robert Alakara

Dell EMC | Enterprise Services

0 Kudos
TaylorData
1 Copper

RE: R710 PSU Redundancy Failure

Sorry for the slow reply, out for the holidays and then got sick.

So I left the PSU running and it failed after a few more days. I was able to find documentation on how to fake a power on press for the PSU (aka "The ATX paper clip method") .I had our Electrical Engineer take a look at it since there was little harm to be done.

www.rcgroups.com/.../showpost.php;postcount=3354

I was able to jump start the power supply and found the following:

Ground Supply Kill C6 (to turn on)
Leave PS_ON (A6) open:
· LED green
· 12V standby is OK
· 3.3V Logic is OK
Ground PS_ON signal:
· Fan spins
· But LED becomes amber*
· Main 12V is dead*. Other 12V standby, 3.3V Logic are still fine.

*Checked with a known good supply: LED is green and Main 12V is fine.

Conclusion: suspect the main dc/dc converter (400V to 12V) failed. They could be either switching devices failed (Power Diodes, or MOSFET, or IGBT) or switching control units, or even broken circuit, etc.
If you can find schematic I certainly can trouble shoot further. I do not know if there is an obvious bad component if we are willing to check further inside….

 

I got two more PSUs from the site. One already failed, the other having these 'pre-failure' symptoms that don't report.

0 Kudos