Dell Unity: SP Panic after Being Up for More than 240 Days
Summary: Dell Unity XT 480, 680, or 880 Storage Processor (SP) may panic after being up for more than 240 days. (Dell Correctable)
Symptoms
Dell Unity XT 480, 680, or 880 SP may panic after being up for 240 days. Other Dell Unity systems can experience the issue after more than 240 days (that is greater than 730 days).
UDoctor may generate an alert on any code version below 5.3 where the SP has been running for more than 240 days, and that alert will reference this KB article. See KB article Dell Unity: Critical Alert 640003 Occurring on OE 5.2.1 or later, Where Storage Processor (SP) Uptime Panic Fix is already applied for more details about the UDoctor alert.
Cause
An SP panic may occur due to an integer overflow calculation leading to a 64-bit result being stored in a 32-bit variable.
The issue is most likely to occur on a Unity XT 480, 680, or 880 array running Unity OE versions 5.1.0.0.5.394 through 5.2.0.0.5.173. This is due to changes within that code and the SP hardware used in those models, and how the code interacts with that hardware.
Resolution
Fix:
The fix is available in Unity OE version 5.2.1.0.5.013 and later. However, Dell does not recommend upgrading to this code version. Dell strongly recommends that customers upgrade to the latest available code, or at a minimum, if the latest code is not the "target" code, upgrade to the target code.
Also, the UDoctor utility identifies this issue on Unity OE version 5.3 and below. This is because the fix was delivered in version 5.3, but was backported to 5.2.1 and greater code. The UDoctor alert triggers on the backported code.
Workaround:
Proactively reboot the SP every 240 days to avoid an SP panic. Instructions to reboot an SP are available in the article: Unity: How to Reboot a Storage Processor (User Correctable).
Customers can issue an "uptime" command by connecting to an array using SSH and using the service account to get the SP runtime.
The example below shows uptime is 31 days.
04:30:01 service@xxx spa:~/user# uptime
04:30am up 31 days 3:41, 2 users, load average: 29.21, 29.45, 29.51
The UDoctor alert will refresh every three days unless the above fix or workaround is implemented. Dell Technologies recommends that customers implement the Fix or the Workaround (if they cannot implement the Fix). But if neither the Fix or the Workaround can be implemented, the UDoctor check for this condition alone can be disabled.
Log in to the primary SP using SSH and issue the following command:
svc_udoctor --jobs --disable CalculateUptime
This disables the check from running every three days.