Dell VxRail: health-check 'ism_fix' or 'rac_fix' correcting iSM and iDRAC issues

Summary: VxVerify on VxRail Manager can attempt to correct iDRAC and iSM fault by restarting iDRAC and related VxRail node services.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

VxVerify on VxRail Manager can attempt to correct iDRAC and iSM fault by restarting iDRAC and related VxRail node services.
Before running tests directly on each node, using VxVerify minion, VxVerify on VxRail Manager first queries the Dell iSM (dcism or dellism).
Alternatively, if iDRAC issues were found when running health-checks, this Autofix is attempted before retrying the health-checks.
If the Autofix option is enabled (either by the test profile or with argument --fix), the attempt to correct this takes around 10 minutes.

The result of this auto-correction is listed as one of the following:
Test Result
Result code
    Result Interpretation
Pass
0
Correcting iSM status was either unnecessary or not enabled under the test profile.
Warning
1
Dell iSM status was running correctly after the restart.
Failure 2
Dell iSM and iDRAC were restarted, but iSM was still not running correctly afterwards.
Critical 3
This test has no critical result.
This fix can also be triggered after the VxVerify checks run, if iDRAC queries fail. In that case, the VxVerify minion will be run for a second time after the fix. The repeated tests check whether iSM and iDRAC issues are fixed.
Each test that passes is not listed in the summary report, for ease of reading.
An example of the health-check output is shown below:
#========================#======#=========#====================================================================#==============#
|  Hostname / Category   |Status  Dell_KB |  Warnings or Failures, unless tests Passed                         ; Product S.N. |
#========================#======#=========#====================================================================#==============#
| _cluster               | Warning 205179 | ism_fix: iSM and iDRAC fixed for node1.lab.local, node4.lab.local                .|
|   ``                   | Warning 205179 | rac_fix: iSM and iDRAC fixed for node2.lab.local                                  |

The 'ism_fix' operation runs prior to the minions and the fix commands are run remotely from VxRM using SSH. For example:
Running VxVerify 3.21.108, pre-upgrade healthcheck on VxRail 7.0.372.
In case of program errors consult article https://www.dell.com/support/kbdoc/000066460.
Step 1: Fixing iSM issue, prior to running health-checks, on node: lab-08-esxi-01.lab.local
Step 1: Fixing iSM issue, prior to running health-checks, on node: lab-08-esxi-02.lab.local
Step 1: Stopping ISM and platform service on lab-08-esxi-01.lab.local
Step 1: Stopping ISM and platform service on lab-08-esxi-02.lab.local
Step 1: Pausing for 266 seconds more after iDRAC restarted on ['lab-08-esxi-01.lab.local', 'lab-08-esxi-02.lab.local'] 
... 
Step 1: Starting iSM on lab-08-esxi-01.lab.local
Step 1: Starting iSM on lab-08-esxi-02.lab.local
Step 1: Pausing for 84 seconds more after Dell iSM started on ['lab-08-esxi-01.lab.local', 'lab-08-esxi-02.lab.local']
...
Step 1: Starting Platform service on lab-08-esxi-01.lab.local
Step 1: Starting Platform service on lab-08-esxi-02.lab.local
The Autofix can also be seen in the vxv.log prior to the minion_run events:
2022-11-11 09:51:26-INFO     [ism_fix] Fixing phase 1 Dell ISM on node on lab-08-esxi-01.lab.local
2022-11-11 09:51:31-INFO     [ism_fix] lab-08-esxi-01.lab.local Auto-fix continuing with vSAN objecthealth: green
2022-11-11 09:51:32-INFO     [ism_fix] iDRAC restarting on lab-08-esxi-01.lab.local: _
...
2022-11-11 09:58:58-INFO     [ism_fix] Checking hosts for auto-fix success: ['lab-08-esxi-01.lab.local', 'lab-08-esxi-02.lab.local']

Cause

To correct dcism not running, the following steps are by VxVerify, if this auto-remediation feature is enabled in the test profile:
  • Stop services: sfcbd, dcism, PTAgent (if present) & Platform-service
  • Restart iDRAC, then wait 5 minutes for iDRAC to come back online
  • Start services (listed above)

Resolution

The Autofix for iSM report its success or failure depending on the ‘dcism’ or ‘dellism’ status, when polled remotely by VxVerify. The minions are then be started as normal.
The iSM status is retested using the ‘dcism’ health-check directly on that node. This can report a different result, because this is polled a few minutes after the Autofix. If the result does differ, the ‘dcism’ test should be viewed as the more accurate result for the status of iSM.

The results of the commands to start the services can be found in the vxv.log (see article 66460: VxVerify Troubleshooting Guide ).
2022-11-25 09:16:26-DEBUG    [ism_fix] node-04.lab.local iSM start: _
2022-11-25 09:18:26-DEBUG    [ism_fix] node-04.lab.local Platform service start: Starting Platform Service Daemon. Check hostd status. hostd is ready. Platform Service started.
2022-11-25 09:18:26-INFO     [ism_fix] Checking hosts for auto-fix success: ['node-04.lab.local']
2022-11-25 09:18:26-INFO     [ism_check] Querying DC or Dell ISM status on host
2022-11-25 09:18:26-INFO     [ism_check] iSM status on node-04.lab.local : iSM is active (running)

If iSM cannot be fixed by the steps above, which the health-check can run automatically, then see article: Dell VxRail: Node health-check fails for test 'dcism'

Additional Information

Force use of ism_fix (iDRAC restart)

The Autofix runs if 'dcism' or 'dellism' are not running, when they are queried from VxRM. However, this only applies if the test profile or --fix argument enables the Autofix.
Alternatively, an iDRAC restart may be recommended to address other issues and so the Autofix can be enabled over a VxVerify argument. 
This is a safer way to recover iDRAC communication, than a restart directly from the iDRAC UI, because VxVerify will shutdown the iSM and related services, before restarting iDRAC, and then bring services back up in the correct order afterwards.
The override argument can either request all nodes have a staggered iDRAC restart, or for a list of specific nodes.

To apply the fix to nodes (even if iSM is running normally), which will restart iDRAC and the related services:

  • Either, apply force the iSM and iDRAC restart procedure ('ism_fix'), to all nodes:

./vxverify.sh -a ism_fix=all​​​
  • Or, apply 'ism_fix' to specified nodes in a list (no spaces) (either short or fully qualified names will work):

python vxverify3.pyc <any_other_arguments> -a ism_fix=lab-08-esxi-01,lab-08-esxi-02

Examples above show the Shell and Python methods of running VxVerify, but the arguments will work with either syntax.
The -a argument (--additional-params), allows an unlimited number of argument pairs to be specified, so it must come after all other standard arguments, such as --verbose

When this argument is used, the override can be seen in the vxv.log as follows:

INFO [ism_fix] Running fix for Dell ISM on node: lab-08-esxi-01, due to override argument: lab-08-esxi-01.lab.local,lab-08-esxi-02.lab.local
or 
INFO [ism_fix] Running fix for Dell ISM on node: lab-08-esxi-02, due to override argument: all 

Affected Products

VxRail, iDRAC Service Module, VxRail Appliance Family, VxRail Appliance Series, VxRail Software
Article Properties
Article Number: 000205179
Article Type: Solution
Last Modified: 18 Dec 2024
Version:  12
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.