Article Number: 520895

printer Print mail Email

RecoverPoint: RPA down for unknown reason

Primary Product: RecoverPoint Gen5 Server

Product: RecoverPoint Gen5 Server more...

Last Published: 24 Mar 2020

Article Type: Break Fix

Published Status: Online

Version: 4

RecoverPoint: RPA down for unknown reason

Article Content

Issue


RecoverPoint GUI reported "RPA down"
RecoverPoint Appliance rebooted unexpectedly

This is a hardware related issue with Intel RPA. Looking through IPMI event logs we can see that an SMI timeout occurred around the time of the RPA reboot.
From log: files/home/kos/kbox/utilities/regulate_reboot/detailed_startup_information.txt
reboot      Tue Dec 6 21:27:46 UTC 2016
 
From log: processes/usr/bin/ipmitoolselelist or files/home/kos/sel/sel.log
59 | 12/06/2016 | 21:16:33 | Unknown SMI Timeout | State Asserted
5a | 12/06/2016 | 21:16:58 | Unknown SMI Timeout | State Asserted

Affected version: All version with Intel hardware RPA

 
Cause
According to Intel documentation: "SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events (typically memory or PCI errors, or other forms of critical interrupts). If this interrupt times out the system is frozen." This long timeout caused the RPA to go down.
Resolution
The affected RPA should recover itself after about ~6-10 minutes. Power cycle the RPA if it does not recover itself. 
Monitor the affected RPA and if the issue persists, this RPA will need to be replaced.
 
Notes

Issue


RecoverPoint GUI reported "RPA down"
RecoverPoint Appliance rebooted unexpectedly

This is a hardware related issue with Intel RPA. Looking through IPMI event logs we can see that an SMI timeout occurred around the time of the RPA reboot.
From log: files/home/kos/kbox/utilities/regulate_reboot/detailed_startup_information.txt
reboot      Tue Dec 6 21:27:46 UTC 2016
 
From log: processes/usr/bin/ipmitoolselelist or files/home/kos/sel/sel.log
59 | 12/06/2016 | 21:16:33 | Unknown SMI Timeout | State Asserted
5a | 12/06/2016 | 21:16:58 | Unknown SMI Timeout | State Asserted

Affected version: All version with Intel hardware RPA

 
Cause
According to Intel documentation: "SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events (typically memory or PCI errors, or other forms of critical interrupts). If this interrupt times out the system is frozen." This long timeout caused the RPA to go down.
Resolution

The affected RPA should recover itself after about ~6-10 minutes. Power cycle the RPA if it does not recover itself. 
Monitor the affected RPA and if the issue persists, this RPA will need to be replaced.
 

Notes

Article Attachments

Attachments

Attachments

Article Properties

First Published

Tue May 08 2018 02:57:21 GMT

First Published

Tue May 08 2018 02:57:21 GMT

Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters