ECS: RAP154: pmon entry missing in crontab in 3.8.0.1 due to regression
Summary: For systems installed or upgraded to ECS 3.8.0.1 General Availability (GA), pmon is not scheduled to run in crontab.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
- High load on nodes and unready DTs observed
Cause
- In ECS 3.8.0.1 General Availability (GA) release, the pmon entry is missing from crontab.
- The pmon service is responsible for restarting low-priority services. Crontab is responsible to run scheduled services.
- Not having the pmon service start on a scheduled basis may cause service issues on the nodes.
Resolution
Notes:
- ECS 3.8.0.1 General Patch 1 (GP1), which was released on February 2, 2023, is not impacted by this problem, and the below steps do not need to be applied post upgrade/install of the GP1 patch.
- This procedure needs to run on every Virtual Data Center (VDC) separately.
- If a rack expansion is performed on a VDC while on ECS 3.8.0.1 GA, these steps need to be followed on the new rack.
- if xDoctor is uninstalled from a rack for any reason, the configuration is lost and the following steps would need to be re-applied.
- Confirm your ECS version and xDoctor version by logging using CLI and running the following command
Command: # svc_version -x Expected Output Similar To: svc_version v1.4.5 (svc_tools v2.8.0) Started 2023-01-23 04:35:25 ECS Version: 3.8.0.1 GA (DARE) Object Version 3.8.0.1-138598.3d5db7c96f2 OS Version 3.8.0.0-2076.a7e36fa.36 Fabric Version 3.8.0.0-4343.878ca95 Fabric-agent Version 3.8.0.0-4343.878ca95 Syslog Version <Unknown> Zookeeper Version 3.8.0.0-119.78667ce Registry Version 2.3.1.0-82.c8163d2 Utilities Version 3.8.0.0-4343.878ca95 Service Console Version 8.0.0.0-22206.f9c9f74a6c xDoctor Version 4.8-87.0 svc_tools Version 2.8.0
- For ECS version:
- If the version is anything other than "3.8.0.1 GA" (for example: "3.8.0.1 GP1" or "ECS 3.8.0.1 IP") then no need to follow the rest of the KB. Only follow the KB if the output is "3.8.0.1 GA".
- All VDCs in the Federation (i.e. VDCs part of the same replication group) need to be on the same version. If this is not the case, then open a case with Support.
- For xDoctor version: If the version is less than 4.8-89.0, then you need to upgrade xDoctor as follows:
- Log in to the Dell Support Site and search for xDoctor using the top search bar.
- To download the xDoctor code and the xDoctor release notes: Click "Downloads & Drivers" under "Resources" section on the left side of the page, and the latest xDoctor version xDoctor4ECS 4.8-xx will appear for download along with the release notes.
- If the xDoctor release notes were not available under "Downloads & Drivers", click "Manuals & Documents" under "Resources" section on the left side of the page, and "xDoctor Release Notes 4.8-xx" will appear for download. Choose the release notes for the corresponding xDoctor version that you have downloaded.
- Verify how many racks you have. If you have more than once rack, then the following steps need to be applied on each rack. This is done by connecting to the 169.254.x.1 IP for each rack.
Command: # getclusterinfo Example of a single-rack VDC: Registered Racks ================ Ip Address epoxy seg mac seg color seg id NAN Hostname =============== ===== ================= ========== ======= ============ 169.254.1.1 False AA:BB:CC:DD:EE:FF red 1 provo-red.nanlocal Example of a two-rack VDC: Registered Racks ================ Ip Address epoxy seg mac seg color seg id NAN Hostname =============== ===== ================= ========== ======= ============ 169.254.1.1 False AA:BB:CC:DD:EE:FF red 1 provo-red.nanlocal 169.254.2.1 False AA:BB:CC:DD:EE:00 green 2 provo-green.nanlocal
- After upgrading xDoctor to version 4.8-89 or above:
- Go to the xDoctor configuration menu
- Enter "7" to choose "Autofixes"
- Enter "1" to choose "Change Autofix Status"
- For "Enable time_zone? [Yes]", press Enter without typing any values to keep the default "Yes".
- For "Enable pmon_crontab_check? [No]", type"y" or "yes" then press Enter to enable the Autofix for the pmon issue.
- Verify that both options are showing as "Enabled".
- For "Issue new Settings? [No]", type "y" or "Yes" and press Enter.
- Enter "7" to choose "Autofixes"
Command # sudo xdoctor --config Expected Output Similar To: admin@provo-orchid:~> sudo xdoctor --config ┌────────────────────────────┐ │ xDoctor Configuration Menu │ └───┬────────────────────────┘ ┌───┼──────────┐ │ 1 │ Overview │ └───┼──────────┘ ┌───┼────────────────────┐ │ 2 │ Event Notification │ └───┼────────────────────┘ ┌───┼─────────────┐ │ 3 │ Auto Update │ └───┼─────────────┘ ┌───┼────────────────┐ │ 4 │ Data Scrubbing │ └───┼────────────────┘ ┌───┼─────────────────────┐ │ 5 │ ECS API Credentials │ └───┼─────────────────────┘ ┌───┼───────────────┐ │ 6 │ IPMI Analysis │ └───┼───────────────┘ ┌───┼───────────┐ │ 7 │ Autofixes │ └───┼───────────┘ │ ┌───┼──────┐ │ 0 │ Exit │ └───┴──────┘ Please make a choice: 7 ┌───────────┐ │ Autofixes │ └───┬───────┘ ┌───┼───────────────────────┐ │ 1 │ Change Autofix Status │ └───┼───────────────────────┘ │ time_zone = Enabled │ pmon_crontab_check = Disabled │ ┌───┼───────────┐ │ 0 │ Main Menu │ └───┴───────────┘ Please make a choice: 1 Enable time_zone? [Yes]: <Leave blank by pressing Enter> Enable pmon_crontab_check? [No]: y New Autofix settings: │ pmon_crontab_check = Enabled │ time_zone = Enabled > Issue new Settings? [No]: y 2023-02-06 13:04:31,693: xDoctor_4.8-89.0 - INFO : Autofix Settings saved and distributed ... [**] Or temporary for one session only:
- Run the following command To kickstart the configured auto-fix. Note the "Session Report" output near the end to use in the next step.
Command: # sudo xdoctor --rap=RAP154 --autofix=pmon_crontab_check Expected Output Similar To: admin@provo-orchid:~> sudo xdoctor --rap=RAP154 --autofix=pmon_crontab_check 2023-02-06 13:09:32,115: xDoctor_4.8-89.0 - INFO : Initializing xDoctor v4.8-89.0 ... 2023-02-06 13:09:32,323: xDoctor_4.8-89.0 - INFO : Removing orphaned session - session_1675433645.103 2023-02-06 13:09:32,324: xDoctor_4.8-89.0 - INFO : Starting xDoctor session_1675688971.893 ... (SYSTEM) 2023-02-06 13:09:32,324: xDoctor_4.8-89.0 - INFO : Primary Node Control Check ... 2023-02-06 13:09:32,414: xDoctor_4.8-89.0 - INFO : xDoctor Composition - Analyzer(s):ac_pmon_crontab_check … 2023-02-06 13:09:35,984: xDoctor_4.8-89.0 - INFO : -------------------- 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : Diagnosis Summary 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : -------------------- 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : PSNT: Unknown 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : -------------------- 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : FIXED = 1 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : CRITICAL = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : CRITICAL (CACHED) = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : ERROR = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : ERROR (CACHED) = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : WARNING = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : INFO = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : VERBOSE = 0 2023-02-06 13:09:35,985: xDoctor_4.8-89.0 - INFO : REPORT = 0 … 2023-02-06 13:09:35,990: xDoctor_4.8-89.0 - INFO : ---------------- 2023-02-06 13:09:35,991: xDoctor_4.8-89.0 - INFO : Session Report - xdoctor --report --archive=2023-02-06_130932 2023-02-06 13:09:35,991: xDoctor_4.8-89.0 - INFO : ----------------
- Use the "Session Report" from the previous step to verify the status is Fixed as highlighted below.
Command: # sudo xdoctor --report --archive=<Session Report output> Expected Output Similar To: admin@provo-test1:~> sudo xdoctor --report --archive=2023-02-06_130932 xDoctor 4.8-89.0 FNM00123456789 - ECS 3.8.0.1 Displaying xDoctor Report (2023-02-06_130932) Filter:[] ... ---------------------------------------------------------------------------- FIXED - Updated the object-main crontab to include pmon on one or more nodes ---------------------------------------------------------------------------- Node = Nodes Extra = {'Nodes': ['169.254.10.1']} Timestamp = 2023-02-06_130932 PSNT = FNM00123456789 @ 4.8-89.
Affected Products
ECS, ECS ApplianceArticle Properties
Article Number: 000208138
Article Type: Solution
Last Modified: 25 May 2023
Version: 12
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.