We have had a few servers fail to stay up and running when just one of the PSUs goes down. We brought one of the problematic PSUs back to our test environment and were able to reproduce the issue for a time being. But the PSU is now functioning correctly. Which leads me to believe there is some sort of firmware issue, as an actually faulty PSU should not magically fix itself.
I know that cold redundancy was a fairly new feature at the time of the R710s release, maybe the PSU thinks its in a cold redundancy state and never ups the power its providing? Is there any way to check the state of the PSU's Cold/Warm Redundancy setting? I know other servers have this available by raw ipmitool commands, but those are undocumented for the public so I don't know how to access them on Dell R710s.