This post is more than 5 years old
14 Posts
0
27007
M1000e PMBus issue / FRU access failed
Hi All,
Have an M1000e, with 4 PSU's installed, 9 fans, 6 switches, 4 M610s - all UK based.
Today went to Home Page >> Power >> budget Status and noticed :
Name Power State Input Volts Input Current Output Rated Power
PS-1 Online 239.5 V 0.8 A 2360 W
PS-2 Online 240.5 V 0.8 A 2360 W
PS-3
PS-4
PS-5 Online 0.0 V 0.0 A 2360 W
PS-6 Online 0.0 V 0.0 A 2360 W
Which meant PS-5/PS-6 look strange, so after grabbing another PSU from spares and doing a swap round, working PSU in slot 4 brings up failed, moved PS-5 (showing online but 0.0v) to slot 3, comes up online and working plus all voltages, moved spares PSU from slot 4 to Slot 3 comes online, move it to slot 5 shows failed.
So its looknig like PS-4, PS-5 and probably PS-6 have an issue. Both lights were on PS-5 before swap round, PS-6 still has both lights AND shows healthy in GUI, plus Online, but, as above shows 0.0V and 0.0A.
We also changed from Power Supply redundancy (where above table was taken) to AC Redundancy - no change.
So in RACADM logs and CMC logs in GUI, we have lots of :
PSU6 PMBus readings suspended for 28800 s due to errors.
PSU5 PMBus readings suspended for 28800 s due to errors.
and
PSU4 FRU access failed.
Last updated Chassis Power Supplies :
Name Power State Input Volts Input Current Output Rated Power
PS-1 Online 239.5 V 0.8 A 2360 W
PS-2 Online 240.0 V 0.8 A 2360 W
PS-3 Online 239.0 V 0.8 A 2360 W
PS-4 Failed 0.0 V 0.0 A
PS-5 Failed 0.0 V 0.0 A
PS-6 Online 0.0 V 0.0 A 2360 W
(Yes I got another spare PSU to fully populate to see what was going on) :-)
Doing a quick google of the 2 above errors brings nothing up - hence being here - Any thoughts ? :-)
DELL-Chris H
Moderator
Moderator
•
8.8K Posts
1
August 8th, 2013 10:00
SCP-UK,
Did you happen to have a power outage or other power event? If so, then you may be able to reseat the power supplies in question and then failover the CMC. That MAY clear the issue. More than likely you will need to power down the blades, then the entire enclosure. After that then remove power cords for a minute and then reinsert and power everything up. That should show all power supplies online. If not then you may want to swap power supplies around and see if error follows specific power supplies. If so they might actually be failed.
Also, is the CMC up to date on this enclosure?
Let me know what you find.
SCP_UK
14 Posts
0
August 8th, 2013 10:00
Hey Chris
Well I was hoping not to have to power the whole enclosure down, and back up, we certainly moved around the PSU's in 3, 4, and 5 proving that each works in slot 3 without issue (so I fairly happy the PSUs are aok).
CMC is sitting at : 4.31, latest version is 4.45 so I shall update it tomorrow when in office, we have a spare CMC as well so Ill do the update on that first to make sure it doesn't break anything :)
Is it worth doing a reset of the CMC via racadm - do not mind trying this now, as long as it doesn't power down the blades (it didnt last time when we upgraded it) :)
DELL-Chris H
Moderator
Moderator
•
8.8K Posts
0
August 8th, 2013 11:00
Glad to hear that. In this case they were showing you an error on the fact the reporting was suspended. That doesn't meant there is a failure in the device, but an issue with communication in reporting it.
DELL-Chris H
Moderator
Moderator
•
8.8K Posts
0
August 8th, 2013 11:00
It shouldn't cause your blades to shutdown. With the failover method it doesn't always clear.
SCP_UK
14 Posts
0
August 8th, 2013 11:00
Hi Chris,
Bingo - that upgrade worked a treat - all issues resolved :-)
Do you know what those errors actually mean though - or why ? Just like to write things up so that if this happens again we have a cheat sheet :-)