Dell Unity: Array on 5.3.0 with SupportAssist experience Storage Processor panic
Summary: Unity arrays running 5.3.0.0.5.120 code with SupportAssist enabled may experience an SP panic after being up for approximately 2 months with a two SCG configuration or after approximately 4 months with a single SCG configuration, or direct connect. ...
Symptoms
SupportAssist is configured and enabled for remote access.
Both direct connect and gateway configurations are affected.
SP panic and unexpected reboot after approximately 2 months of runtime with a two SCG configuration
SP panic and unexpected reboot after approximately 4 months of runtime with a single SCG configuration, or direct-connect.
Cause
The panic should only occur on the primary SP running the ESE process and following the panic the resources are released.
Resolution
This issue is fixed in Unity Operating Environment (OE) version 5.3.1.0.5.008.
Workaround:
There are multiple workarounds available; see the Additional Info section for the detailed steps for each workaround.
Additional Information
Workaround Option #1:
Restarting SupportAssist when the number of zombie curl processes has become high will clear them out and prevent an SP panic from occurring. The recommended threshold for restarting SupportAssist is 5,000. The commands for checking the number of zombie (defunct) curl processes and restarting SupportAssist are shown below.
14:01:20 service@none spb:~/user# ps -ef |grep curl|grep defunct|wc -l
4702 <----------------Current Number of zombie curl processes
14:01:52 service@none spb:~/user# svc_supportassist --restart
Restart in progress........Completed!
14:03:59 service@none spb:~/user# svc_supportassist --status
State: Running
Type: Connect through a gateway server
Connectivity: Reachable
Primary gateway: https://1.2.3.4:9443 (Reachable)
Remote Access: Yes
RSC Enabled: No
Version: 4.7.7.21
Initialized: Yes
Proxy mode: none
14:04:22 service@none spb:~/user# ps -ef |grep curl|grep defunct|wc -l
0 <----------------- Number of zombie curl processes after SupportAssist restart
Workaround Option #2:
A new UDoctor script (udoctor_update_supportassist) has been developed and is being made available to connected Unity arrays in a staggered rollout. If accepted and installed, the new Udoctor script will clear any zombie (defunct) curl processes and also prevent the accumulation of any new zombie (defunct) curl processes in the future.
The UDoctor script is pushed automatically to systems which have callhome enabled and which call home and indicate they have 5.3.0 installed. In the past, priority was given to systems which had high numbers of zombie processes, but that priority has been eliminated and we are now accelerating the rollout to all systems which connect home indicating 5.3.0. Once the package has been pushed to your system, you will see an alert similar to the one shown:
UDoctor packages are used to apply targeted updates, workarounds, and configuration changes to the Unity array, independent of a full software OE upgrade.
Reference knowledge article Dell Unity: UDoctor package (xxxxxx) is now available for installation. (User Correctable) for how to identify if a new UDoctor package is available and how to accept and install a new UDoctor package.
NOTE 1:
When an upgrade (NDU) of the Unity OE is performed it will overwrite any changes made by the UDoctor package. This means when the software fix becomes available in a new Unity OE release, a standard NDU can be performed and no additional steps are required.
NOTE 2:
There is no way to override the inventory/push process and force the UDoctor package to be pushed to any particular Unity system. The inventory/push process occurs weekly. For customers who desire the fix sooner, the correct solution is to upgrade to Unity OE version 5.3.1.0.5.008 (5.3 SP1). Alternatively, customers can utilize the other workarounds listed above.