Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

27007

August 8th, 2013 09:00

M1000e PMBus issue / FRU access failed

Hi All,

Have an M1000e, with 4 PSU's installed, 9 fans, 6 switches, 4 M610s - all UK based.

Today went to Home Page >> Power >> budget Status and noticed :

Name  Power State  Input Volts  Input Current  Output Rated Power 
PS-1  Online   239.5 V  0.8 A   2360 W 
PS-2  Online   240.5 V  0.8 A   2360 W 
PS-3   
PS-4   
PS-5  Online   0.0 V   0.0 A   2360 W
PS-6  Online   0.0 V   0.0 A   2360 W

Which meant PS-5/PS-6 look strange, so after grabbing another PSU from spares and doing a swap round, working PSU in slot 4 brings up failed, moved PS-5 (showing online but 0.0v) to slot 3, comes up online and working plus all voltages, moved spares PSU from slot 4 to Slot 3 comes online, move it to slot 5 shows failed.

So its looknig like PS-4, PS-5 and probably PS-6 have an issue.  Both lights were on PS-5 before swap round, PS-6 still has both lights AND shows healthy in GUI, plus Online, but, as above shows 0.0V and 0.0A.

We also changed from Power Supply redundancy (where above table was taken) to AC Redundancy - no change.

 
So in RACADM logs and CMC logs in GUI, we have lots of :

PSU6 PMBus readings suspended for 28800 s due to errors.
PSU5 PMBus readings suspended for 28800 s due to errors.

and

PSU4 FRU access failed.

Last updated Chassis Power Supplies :

Name  Power State  Input Volts  Input Current  Output Rated Power  
PS-1  Online   239.5 V  0.8 A   2360 W  
PS-2  Online   240.0 V  0.8 A   2360 W  
PS-3  Online   239.0 V  0.8 A   2360 W  
PS-4  Failed   0.0 V   0.0 A   
PS-5  Failed   0.0 V   0.0 A   
PS-6  Online   0.0 V   0.0 A   2360 W

(Yes I got another spare PSU to fully populate to see what was going on) :-)

Doing a quick google of the 2 above errors brings nothing up - hence being here - Any thoughts ? :-)

 

Moderator

 • 

8.8K Posts

August 8th, 2013 10:00

SCP-UK,

Did you happen to have a power outage or other power event? If so, then you may be able to reseat the power supplies in question and then failover the CMC. That MAY clear the issue. More than likely you will need to power down the blades, then the entire enclosure. After that then remove power cords for a minute and then reinsert and power everything up. That should show all power supplies online. If not then you may want to swap power supplies around and see if error follows specific power supplies. If so they might actually be failed.

Also, is the CMC up to date on this enclosure?

Let me know what you find.

14 Posts

August 8th, 2013 10:00

Hey Chris

Well I was hoping not to have to power the whole enclosure down, and back up, we certainly moved around the PSU's in 3, 4, and 5 proving that each works in slot 3 without issue (so I fairly happy the PSUs are aok).

CMC is sitting at : 4.31, latest version is 4.45 so I shall update it tomorrow when in office, we have a spare CMC as well so Ill do the update on that first to make sure it doesn't break anything :)

Is it worth doing a reset of the CMC via racadm - do not mind trying this now, as long as it doesn't power down the blades (it didnt last time when we upgraded it) :)

Moderator

 • 

8.8K Posts

August 8th, 2013 11:00

Glad to hear that. In this case they were showing you an error on the fact the reporting was suspended. That doesn't meant there is a failure in the device, but an issue with communication in reporting it.

Moderator

 • 

8.8K Posts

August 8th, 2013 11:00

It shouldn't cause your blades to shutdown. With the failover method it doesn't always clear.

14 Posts

August 8th, 2013 11:00

Hi Chris,

Bingo - that upgrade worked a treat - all issues resolved :-)

Do you know what those errors actually mean though - or why ?  Just like to write things up so that if this happens again we have a cheat sheet :-)

No Events found!

Top