Article summary: This article provides information about CMC reports repeated loss and regain of power redundancy.
Issue:
Repeated loss and regain of redundancy status is a PowerEdge M1000e Chassis Management Controller (CMC) issue that can occur when the chassis burden (i.e. the aggregate of server allocations and infrastructure plus fan reserve) exceeds and drops around the redundancy capability of the chassis. A customer may have a power cap in place that would cause this type of event to be logged repeatedly. We have added a new configuration option that is known as Server Performance Over Power Redundancy (default is TRUE i.e. allow server performance increase at the expense of redundancy loss) to address this issue.
You may see that the following event logged:
Repeated loss and regain of redundancy:
Feb 15 09:43:42 CMC-52Q9S4J pwrmgmtd[839]: Lost redundancy! Power health set to CRITICAL.
Feb 15 09:43:45 CMC-52Q9S4J pwrmgmtd[839]: Regained redundancy! Power health set to OK.
Feb 15 09:43:47 CMC-52Q9S4J pwrmgmtd[839]: Lost redundancy! Power health set to CRITICAL.
Feb 15 09:43:52 CMC-52Q9S4J pwrmgmtd[839]: Regained redundancy! Power health set to OK.
Feb 15 09:43:52 CMC-52Q9S4J pwrmgmtd[839]: Lost redundancy! Power health set to CRITICAL.
Feb 15 09:43:57 CMC-52Q9S4J pwrmgmtd[839]: Regained redundancy! Power health set to OK.
Feb 15 09:43:58 CMC-52Q9S4J pwrmgmtd[839]: Lost redundancy! Power health set to CRITICAL.
Solution:
Although this is not the only condition that may cause redundancy lost events this is a good place to start when troubleshooting.
Try the following steps when troubleshooting this type of issue:
- Ensure the PDU Capability of the data center for that chassis allows increasing System Input Power Cap to a value higher than the customer's Power Cap
- Go to Chassis Overview -> Power -> Configuration page in GUI, and increase System Input Power Cap to max supported by CMC
- Unselect checkbox for Server Performance Over Power Redundancy, and hit apply
- Refresh GUI page and verify that System Input Power Cap is shown as 100% and Server Performance Over Power Redundancy is cleared
CMC for M1000e SYSTEMS MANAGEMENT - WIKI