PowerScale iDRAC is Experiencing Connectivity Issues
Summary: PowerScale F, P, and B series nodes create an iDRAC connectivity event either the first day of every month or every Monday.
Symptoms
PowerScale F200, F600, F900, P100, or B100 nodes generate the following alert every first day of the month or every Monday:
47.693031 11/01 00:20 C 3 1076769 The Integrated Dell Remote Access Controller (iDRAC) located in chassis XXXXXXX is experiencing connectivity problems. This controller monitors hardware components such as batteries and power supplies. To ensure these hardware components continue to be monitored, contact Dell EMC support as soon as possible.
Cause
There is a cron job called isi_security_checker that runs on the cluster by default, on the first of every month or every Monday morning. The day the job runs depends on the version of OneFS that is installed. This job can overload the iDRAC when using default settings, triggering these messages.
Resolution
The permanent fix for this is in the latest Health Check Framework (HCF) patch.
If you need assistance with implementing the workaround, contact Dell Technical Support and quote this article ID.
Workaround:
-
On the cluster, create a file called "security_checker.sh" under
/ifs/data/Isilon_Support/adding the following entry within:for i in $(isi_nodes %{lnn}); do /usr/bin/isi_security_check/isi_security_checker -n $i --node-only; done -
Open and edit
/etc/mcp/templates/crontaband comment out (#) the currentisi_security_checkerjob and add a new entry to run the file you created above. The new entry must run usingisi_ropc -s -Hwhich must be passed through a shell, since/ifsis mountednoexec.#20 0 1 * * root /usr/bin/isi_security_check/isi_security_checker 20 0 1 * * root /usr/bin/isi_ropc -s -H /usr/local/bin/zsh /ifs/data/Isilon_Support/security_checker.sh
-
Verify that all nodes have the updated changes (which should match the output seen in step 2).
# isi_for_array -sX "grep security_checker /etc/crontab"
-
Confirm that all nodes have the same MD5 has for the
/etc/mcp/templates/crontabfile.# isi_for_array -sX "md5 /etc/mcp/templates/crontab"
If the md5 hash is different for any node, copy the
/etc/mcp/templates/crontabfile you modified to/ifs/data/Isilon_Support. Log in to the node with the different md5 value, and copy/ifs/data/Isilon_Support/crontabover the existing file. Verify that the permissions remain 640 (-rw-r-----) -
Gather the current Process ID (PID) for
cron:# isi_for_array -sX "ps -auxww | grep cron | grep -v grep"
Example:
LAB-1# isi_for_array -sX "ps -auxww | grep cron | grep -v grep" LAB-1: root 3140 0.0 0.0 25488 13016 - Is 6Oct24 0:14.15 /usr/sbin/cron -s LAB-2: root 3144 0.0 0.0 25488 13016 - Is 6Oct24 0:14.39 /usr/sbin/cron -s LAB-3: root 3173 0.0 0.0 25488 13016 - Is 6Oct24 0:14.14 /usr/sbin/cron -s
-
Restart
cronon the cluster.# isi_for_array -sX "/etc/rc.d/cron restart"
Example:
LAB-1# isi_for_array -sX "/etc/rc.d/cron restart" LAB-1: Stopping cron. LAB-1: Waiting for PIDS: 3140. LAB-1: Starting cron. LAB-2: Stopping cron. LAB-2: Waiting for PIDS: 3144. LAB-2: Starting cron. LAB-3: Stopping cron. LAB-3: Waiting for PIDS: 3173.
-
If you receive
Exit status 1on any node, restart thecronon that node. Gather the current PID forcronand confirm that the process restarted on all nodes by following the steps outlined in step 5 (PIDs should have changed).