Avamar Gen4S Hardware: Learn Cycle Time Out
Summary: The learn cycle fails on Avamar Gen4S Hardware.
Symptoms
The following error appears in the Avamar UI or the logs:
<MRMON154> Controller ID: 0 Battery relearn timed out Cause
This is under review by the vendor.
Resolution
1. Log in to the Avamar server as admin, elevate to root, and load the SSH keys.
For instructions on loading keys, see Avamar: How to Log in to an Avamar Server and Load Various Keys.
2. Using the information from the UI event or the Dial Home Service Request:
a. Determine the node that produced the error message.
b. Connect to that node as root:
ssn 0.# --user=root
(Where 0.# is the physical node number)
3. Extract /var/log/messages files using the applicable command below:
bunzip2 /var/log/messages*
gunzip /var/log/messages*
xz --decompress /var/log/messages*
4. Review the messages log (/var/log/messages) for Battery relearn events:
grep -i "battery relearn" /var/log/messages
Jul 29 13:37:12 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON157> Controller ID: 0 Battery relearn will start in 4 days Jul 31 13:37:48 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON158> Controller ID: 0 Battery relearn will start in 2 days Aug 1 13:37:33 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON159> Controller ID: 0 Battery relearn will start in 1 day Aug 2 08:37:13 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON160> Controller ID: 0 Battery relearn will start in 5 hours Aug 2 13:38:24 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON155> Controller ID: 0 Battery relearn pending: Battery is under charge Aug 2 13:39:28 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON151> Controller ID: 0 Battery relearn started Aug 2 13:40:36 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON152> Controller ID: 0 Battery relearn in progress Aug 2 13:40:36 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON153> Controller ID: 0 Battery relearn completed Aug 13 16:32:15 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON155> Controller ID: 0 Battery relearn pending: Battery is under charge Aug 13 16:44:10 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON151> Controller ID: 0 Battery relearn started Aug 13 16:45:15 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON152> Controller ID: 0 Battery relearn in progress Aug 13 16:48:30 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON154> Controller ID: 0 Battery relearn timed out
5. Confirm using CmdTool2 that the learning has failed, but that the battery does not display 0 Volts:
CmdTool2 -AdpBbuCmd -GetBbuStatus -a0
BBU status for Adapter: 0
BatteryType: CVPM02
Voltage: 9563 mV
Current: 0 mA
Temperature: 30 C
BBU Firmware Status:
Charging Status : None
Voltage : OK
Temperature : OK
Learn Cycle Requested : No
Learn Cycle Active : No
Learn Cycle Status : Failed
Learn Cycle Timeout : Yes
I2c Errors Detected : No
Battery Pack Missing : No
Battery Replacementrequired : No
Remaining CapacityLow : No
Periodic Learn Required : No
Transparent Learn : No
No space to cache offload : No
Pack is about to fail & should be replaced : No
Cache Offload premium feature required : No
Module microcode update required : No
GasGuageStatus:
Fully Discharged : No
FullyCharged : Yes
Discharging : Yes
Initialized : No
Remaining Time Alarm : No
Remaining Capacity Alarm: No
Discharge Terminated : No
OverTemperature : No
Charging Terminated : Yes
Over Charged : No
Pack energy : 96 J
Capacitance : 100
Remaining reserve space : 93
Exit Code: 0x00
6. Start a manual learn cycle:
sudo CmdTool2 -AdpBbuCmd -BbuLearn -a0
7. Review the messages log to see if the learn cycle started and finished.
Example 1:
Aug 26 12:15:01 AVAMAR-GRID-VAR-LOG-MESSAGE syslog-ng[3170]: Configuration reload request received, reloading configuration;
Aug 26 12:15:01 AVAMAR-GRID-VAR-LOG-MESSAGE syslog-ng[3170]: New configuration initialized;
Aug 26 12:15:11 AVAMAR-GRID-VAR-LOG-MESSAGE sudo: admin : TTY=pts/0 ; PWD=/data01/home/admin ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -pdlist -a0 -nolog
Aug 26 12:16:10 AVAMAR-GRID-VAR-LOG-MESSAGE sudo: admin : TTY=pts/0 ; PWD=/data01/home/admin ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -AdpBbuCmd -GetBbuStatus -a0
Aug 26 12:18:28 AVAMAR-GRID-VAR-LOG-MESSAGE sudo: admin : TTY=pts/0 ; PWD=/data01/home/admin ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -AdpBbuCmd -BbuLearn -a0
Aug 26 12:18:31 AVAMAR-GRID-VAR-LOG-MESSAGE MR_MONITOR[5742]: Controller ID: 0 Battery relearn pending: Battery is under charge
Aug 26 12:19:36 AVAMAR-GRID-VAR-LOG-MESSAGE MR_MONITOR[5742]: Controller ID: 0 Battery relearn started
Aug 26 12:20:02 AVAMAR-GRID-VAR-LOG-MESSAGE sudo: admin : TTY=pts/0 ; PWD=/data01/home/admin ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -AdpBbuCmd -GetBbuStatus -a0
Aug 26 12:20:44 AVAMAR-GRID-VAR-LOG-MESSAGE MR_MONITOR[5742]: Controller ID: 0 Battery relearn in progress
Aug 26 12:20:44 AVAMAR-GRID-VAR-LOG-MESSAGE MR_MONITOR[5742]: Controller ID: 0 Battery relearn completed
Aug 26 12:22:25 AVAMAR-GRID-VAR-LOG-MESSAGE sudo: admin : TTY=pts/0 ; PWD=/data01/home/admin ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -AdpBbuCmd -GetBbuStatus -a0
Although the Relearn appears to have finished, the time to complete is minimal (which is not normal).
Example 2:
Aug 26 01:30:23 AVATPCKVS41N05 sudo: root : TTY=pts/0 ; PWD=/root ; USER=root ; COMMAND=/opt/MegaRAID/CmdTool2/CmdTool2 -AdpBbuCmd -BbuLearn -a0
Aug 26 01:31:12 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON155> Controller ID: 0 Battery relearn pending: Battery is under charge
Aug 26 16:44:10 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON151> Controller ID: 0 Battery relearn started
Aug 26 16:45:15 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON152> Controller ID: 0 Battery relearn in progress
Aug 26 16:48:30 AVATPCKVS41N05 MR_MONITOR[8625]: <MRMON154> Controller ID: 0 Battery relearn timed out
Here, the "relearn" timed out.
8. Recheck the battery status:
CmdTool2 -AdpBbuCmd -GetBbuStatus -a0
If the status is OK and there is no timeout (as seen below), no further action is required.
BBU status for Adapter: 0
BatteryType: CVPM02
Voltage: 9563 mV
Current: 0 mA
Temperature: 30 C
BBU Firmware Status:
Charging Status : None
Voltage : OK
Temperature : OK
Learn Cycle Requested : No
Learn Cycle Active : No
Learn Cycle Status : OK
Learn Cycle Timeout : No
I2c Errors Detected : No
...
If the issue remains as seen below, Create a Service Request providing the output above to determine if a node replacement is required.
BBU status for Adapter: 0
BatteryType: CVPM02
Voltage: 9563 mV
Current: 0 mA
Temperature: 30 C
BBU Firmware Status:
Charging Status : None
Voltage : OK
Temperature : OK
Learn Cycle Requested : No
Learn Cycle Active : No
Learn Cycle Status : Failed
Learn Cycle Timeout : Yes
I2c Errors Detected : No
...