Start a Conversation

Unsolved

This post is more than 5 years old

51207

February 16th, 2015 14:00

Critical Alerts Indicating "The Perc1 Battery Has Failed"

My company has about ten R720s that have reported "Perc1 battery has failed", generally followed by "Perc1 battery is Operating Normally". This is out of a population of about 100 R720s. These are all using the PERC H710 Mini at firmware 21.2.0-0007, A04. Often the Alerts repeat on a 90 day basis, suggesting this is related to TLC. 

In most cases the R720s are in production and using our hypervisor, which prevents us from collecting a DSET or the RAID controller log. In the few cases where we have provided a Raid Controller log, Dell support has indicated the battery Full Charge Capacity was too low so they dispatched a new Perc battery.

Generally we can only provide the IDRAC .sel showing the two entries, Perc1 battery failed followed by the Perc1 Battery Operating Normally (file attached). In these cases Dell has not dispatched a battery and suggested that the issue is related to the 90 Battery Learning Cycle.

If the Learning Cycle is supposed to be transparent, why are we getting these alerts?

Are there known Perc battery reporting issues with the FW level I mentioned? We are suspecting this since in one case we replaced the battery and the problem returned a few months later.

Most of the batteries reporting the failure have a 2011 manufacture date. Is this towards the end of their life? 

Thanks in advance!

Mark

 

1 Attachment

Moderator

 • 

8.8K Posts

February 17th, 2015 07:00

MarkScience1,

I would agree to a point with what support said, up until the Perc Battery is failed error. Which isn't going to be associated to the 90day battery conditioning. You may see the Charging/Discharging notifications, but the Battery Failed is an error. The first thing I would do in regards to raid controller battery errors, especially across multiple systems, would be to ensure the servers are up to date. The raid controllers can produce false battery errors when they get out of date. Now 21.2.0-0007 is not very far back, but I would look at updating the BIOS, Drac, as well as bring the raid controller current. Now this will require reboots, so you would need to schedule downtime accordingly. 

If you let me know the specific OS I can get you links to the updates, or you can access the Lifecycle Controller, via F10 at startup, and then proceed to Platform Update and use the FTP.

Let me know.

February 18th, 2015 09:00

Thanks for the explanation. Dell support had previously led me to believe that the Perc1 Battery Failure may be a normal part of the relearn cycle, but that didn't seem right. Regarding updates to the R720s, that has been suggested previously, but the servers are in production and it takes months to schedule an outage for FW upgrades. Plus we have not found a specific update documented that would address this, and we don't want the issue returning after scheduling a upgrade. 

We had a Perc1 Battery failure on Monday and I captured a full DSET and provided to Dell support. In this case he dispatched a battery because he believed that the relearn cycle kicked in early, but after reviewing closely I see it the relearn cycle and battery failure was exactly 90 days since the last TLC. I'm attaching the RAID Log for you.

We do see this sequence in the RAID log. Is this enough evidence to say that the battery really needs replacement, or is this determined strictly by the battery capacity values?   

02/15/15 22:23:18: EVT#06267-02/15/15 22:23:18: 151=Battery relearn started

02/15/15 22:24:23: EVT#06268-02/15/15 22:24:23: 152=Battery relearn in progress

02/16/15  1:06:53: EVT#06269-02/16/15  1:06:53: 148=Battery is discharging

02/16/15  2:32:28: EVT#06270-02/16/15  2:32:28: 378=Battery life has degraded and cannot initiate transparent learn cycles

 

In this case the battery shows a manufacture date of 2011. Is that considered near the end of its life?

 

Thank you, Mark

1 Attachment

No Events found!

Top