ECS:OBS: xDoctor:手动升级过程

Summary: 从命令行手动升级 xDoctor。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

xDoctor 是一种支持和诊断工具,可识别并帮助解决可能对 ECS 和 OBS 系统产生负面影响的已知配置、软件和硬件问题。关键功能包括:

  • 故障排除:帮助支持团队和客户确定 ECS 问题的根本原因。
  • 主动监控:检测问题的早期迹象
  • 支持参与:某些作和解决方案需要戴尔支持的参与。

戴尔技术支持建议对所有 VDC 机架进行最新的 xDoctor 运行状况检查。与每个版本一样,xDoctor 通过改进的运行状况检查和自动修复程序进行更新。在后来的 xDoctor 版本中引入了自动修复器。  

确定安装的 ECS/OBS 和 xDoctor 版本:

admin@node1:~> sudo xdoctor -x
....................
ECS Version: 3.8.1.6
-----------------------
xDoctor Version: 4.8-104.0
-----------------------


确定是否所有节点都具有相同的 xDoctor 版本:

admin@node1:~> sudo xdoctor -s
xDoctor Uniform on all nodes: 4.8-104.0


手动 xDoctor 升级:

  1. 根据上述ECS/OBS版本下载最新版本的xDoctor。请注意,下载 xDoctor 软件包时,这取决于您在 ECS 和 OBS 中的代码版本 — 还要注意以下文件格式:
    1. 下载最新版本需要登录
      1. ECS(3.8 及更低版本):xDoctor 软件包从 4.x 开始。(示例:4.8-105.0 - ECS 产品支持网页
      2. OBS(3.9及以上):xDoctor软件包从 5.x 开始(例如:5.1-105.0) - OBS产品支持网页
      3. 已下载版本的 xDoctor 发行说明中有一个专门用于升级 xDoctor 的部分。
  2. 最新的 xDoctor RPM 文件上传到 /home/admin 机架上节点的目录 - 下面的示例介绍了 ECS,执行与 OBS 更改文件名相同的步骤:
admin@node1~> ls -l | grep xDoctor
-rw-r--r-- 1 admin users   20057045 Jul  6  2025 xDoctor4ECS-4.8-104.0.noarch.rpm
-rw-r--r-- 1 admin users   31927626 Aug 26 15:11 xDoctor4ECS-4.8-105.0.noarch.rpm
  1. 运行 xDoctor 升级命令。可以按机架或按 VDC 执行升级。
机架升级命令 — 通过键入 A 启用所有自动修复程序:
# sudo xdoctor --upgrade --local=/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm

admin@node1:~> sudo xdoctor --upgrade --local=/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm
This new xDoctor RPM has the following Auto Healers:

┌──────────────┐
│ Auto Healers │
└───┬──────────┘
    │
    │ time_zone                              = Enabled
    │ pmon_crontab_check                     = Disabled
    │ pmon_swapiness_check                   = Disabled
    │ rsyslogd_check                         = Disabled
    │ task_md_cleanup_status                 = Disabled
    │ ntpd_not_running                 (New) = Disabled
    │ cron_not_running                 (New) = Disabled
    │ machines_file_error              (New) = Disabled
    │ non_uniform_psnt                 (New) = Disabled
    │ racadm_stale_pid                 (New) = Disabled
    │ obj_control_svc_check            (New) = Disabled

In order to have them active, they need to be enabled ...
You can do this during this upgrade or later post upgrade via `xdoctor --config`

Would you like to enable (A)ll, only the (N)ew ones or (I)gnore them [I]: A

2025-10-09 11:36:15,663: xDoctor_4.8-104.0 - INFO    : User selected to enable all Auto Healers ...
2025-10-09 11:36:15,663: xDoctor_4.8-104.0 - INFO    : Local Upgrade (/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm)
2025-10-09 11:36:15,696: xDoctor_4.8-104.0 - INFO    : Current Installed xDoctor version is 4.8-104.0
2025-10-09 11:36:15,712: xDoctor_4.8-104.0 - INFO    : Requested package version is 4.8-105.0
2025-10-09 11:36:15,713: xDoctor_4.8-104.0 - INFO    : Updating xDoctor RPM Package (RPM)
2025-10-09 11:36:15,935: xDoctor_4.8-104.0 - INFO    :  - Distribute package
2025-10-09 11:36:17,402: xDoctor_4.8-104.0 - INFO    :  - Install new rpm package
2025-10-09 11:36:33,562: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: pmon_crontab_check ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: pmon_swapiness_check ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: rsyslogd_check ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: task_md_cleanup_status ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: ntpd_not_running ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: cron_not_running ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: machines_file_error ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: non_uniform_psnt ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: racadm_stale_pid ...
2025-10-09 11:36:33,563: xDoctor_4.8-104.0 - INFO    : Enabling Auto Healer: obj_control_svc_check ...
2025-10-09 11:36:33,828: xDoctor_4.8-104.0 - INFO    : Auto Healer Settings saved and distributed ...

┌──────────────────────┐
│ Updated Auto Healers │
└───┬──────────────────┘
    │
    │ time_zone                              = Enabled
    │ pmon_crontab_check                     = Enabled
    │ pmon_swapiness_check                   = Enabled
    │ rsyslogd_check                         = Enabled
    │ task_md_cleanup_status                 = Enabled
    │ ntpd_not_running                 (New) = Enabled
    │ cron_not_running                 (New) = Enabled
    │ machines_file_error              (New) = Enabled
    │ non_uniform_psnt                 (New) = Enabled
    │ racadm_stale_pid                 (New) = Enabled
    │ obj_control_svc_check            (New) = Enabled

2025-10-09 11:36:33,829: xDoctor_4.8-104.0 - INFO    : xDoctor successfully updated to version 4.8-105.0
VDC 升级命令 — 通过键入 A 启用所有自动修复程序
# sudo xdoctor --upgrade --vdc-upgrade --local=/home/admin/xDoctor4ECS-4.8.105.0.noarch.rpm
 

xDoctor 并非在所有节点上都统一:

如果节点在 xDoctor 更新期间重新安装或处于离线状态,则其最终可能会与群集其余部分的版本不同:
admin@node1:~> sudo xdoctor -s
xDoctor not uniform across all nodes ...
Trying xDoctor Resync ...
Resync failed: No xDoctor package found for re-installation
[4.8-104.0] -> ['169.254.1.2']
[4.8-105.0] -> ['169.254.1.6', '169.254.1.5', '169.254.1.4', '169.254.1.3', '169.254.1.1']
xDoctor 输出警报,提醒机架的所有节点上它不统一:
admin@node1:~> sudo xdoctor --report --archive=2022-06-26_101004 -WEC

xDoctor 4.8.105.0
CKM00xxxxxxxx - ECS 3.8.1.2

Displaying xDoctor Report (2022-06-26_101004) Filter:['CRITICAL', 'ERROR', 'WARNING'] ...

--------------------------------------------
ERROR - xDoctor not uniform across all nodes
--------------------------------------------
Extra     = Not allowed to use the SYSTEM scope, use LOCAL scope instead or reinstall xDoctor -> xdr_versions={'4.8-104.0': ['169.254.1.2'], '4.8-105.0': ['169.254.1.1', '169.254.1.6', '169.254.1.4', '169.254.1.3', '169.254.1.5']}
RAP       = RAP099
Solution  = KB 91703
Timestamp = 2022-06-26_101004
PSNT      = CKM00xxxxxxxx @ 4.8.105.0
尝试 xDoctor 升级将失败,因为 xDoctor 检测到其他节点具有您尝试安装的版本:
admin@node1:~> sudo xdoctor --upgrade --local=/home/admin/xDoctor4ECS-4.8.105.0.noarch.rpm
2022-06-28 15:11:40,101: xDoctor_4.8-105.0 - INFO    : xDoctor Upgrader Instance (2:FTP_SFTP)
2022-06-28 15:11:40,101: xDoctor_4.8-105.0 - INFO    : Local Upgrade (/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm)
2022-06-28 15:11:40,134: xDoctor_4.8-105.0 - INFO    : Current Installed xDoctor version is 4.8-105.0
2022-06-28 15:11:40,174: xDoctor_4.8-105.0 - INFO    : Requested package version is 4.8-105.0
2022-06-28 15:11:40,174: xDoctor_4.8-105.0 - INFO    : xDoctor is up-to-date, only newer versions allowed ...
如果存在最新版本的节点,但您想要在机架上重新应用相同的 xDoctor 版本,请使用 xDoctor 重新安装选项: 
admin@node1:~> sudo xdoctor --upgrade --local=/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm --reinstall
2022-06-28 15:12:53,079: xDoctor_4.8-105.0 - INFO    : xDoctor Upgrader Instance (2:FTP_SFTP)
2022-06-28 15:12:53,079: xDoctor_4.8-105.0 - INFO    : Local Upgrade (/home/admin/xDoctor4ECS-4.8-105.0.noarch.rpm)
2022-06-28 15:12:53,112: xDoctor_4.8-105.0 - INFO    : Current Installed xDoctor version is 4.8-105.0
2022-06-28 15:12:53,147: xDoctor_4.8-105.0 - INFO    : Requested package version is 4.8-105.0
2022-06-28 15:12:53,148: xDoctor_4.8-105.0 - WARNING : (Re)installing requested xDoctor package ...
2022-06-28 15:12:53,148: xDoctor_4.8-105.0 - INFO    : Updating xDoctor RPM Package (RPM)
2022-06-28 15:12:53,244: xDoctor_4.8-105.0 - INFO    :  - Distribute package
2022-06-28 15:12:54,115: xDoctor_4.8-105.0 - INFO    :  - Install new rpm package
2022-06-28 15:13:08,544: xDoctor_4.8-105.0 - INFO    : xDoctor successfully updated to version 4.8-105.0
2020-09-01 09:04:30,184: xDoctor_4.8-105.0 - INFO    : xDoctor Activation skipped. Only tested on Rack Master 
确定是否所有节点现在都具有相同的 xDoctor 版本:
admin@node1:~> sudo xdoctor -s
xDoctor Uniform on all nodes: 4.8-105.0.
重新检查 xDoctor 版本:
admin@node1:~> sudo xdoctor -x
........
ECS Version: 3.8.1.6
-----------------------
xDoctor Version: 4.8-105.0
-----------------------
确定是否所有节点都具有相同的 xDoctor 版本:
admin@node1:~> sudo xdoctor -s
xDoctor Uniform on all nodes: 4.8-105.0
 

运行 xDoctor 运行状况检查:

要运行 xDoctor 运行状况检查,请先清除高速缓存,然后运行 xDoctor:
admin@node1:~> sudo xdoctor --clear
You are about to clear all xDoctor Cache files. Are you sure you want to proceed?  [No]: yes
2020-09-07 14:03:34,263: xDoctor_4.8-105.0- INFO    : xDoctor Cache files successfully cleared.
要运行 xDoctor 运行状况检查,请执行以下作:
admin@node1:~> sudo xdoctor --hr
--hr 选项尝试向 ECS 支持发送最新的 xDoctor 运行状况检查报告,因为连接可能会被阻止。

xDoctor 从作为主节点的单个节点运行,并将其他节点的 xDoctor 报告一起提取到该节点上。因此,在该节点上运行 xDoctor,默认为机架上的第一个节点。如果节点未响应 VDC,则下一个节点将成为 “主节点” ,即节点 2。请参阅上述 xDoctor 升级中的 “已跳过 xDoctor 激活” ,仅在主节点上处于活动状态。

在 xDoctor 运行结束时,xDoctor 将生成一个会话报告,要在 xDoctor 报告中检查的项目包括“WARNING”、“ERROR”或“CRITICAL”:
admin@node1:~> sudo xdoctor --hr
....................
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : PSNT: CKM00xxxxxxxxxx
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : --------------------
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : Number of CRITICAL:    0
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : Number of FIXED:         0
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : Number of ERROR:       0
2020-09-07 13:58:33,520: xDoctor_4.8-105.0 - INFO    : Number of WARNING:     1
....................
....................
2020-09-07 13:58:42,910: xDoctor_4.8-105.0 - INFO    : --------------------------
2020-09-07 13:58:42,910: xDoctor_4.8-105.0 - INFO    : Session Report                - xdoctor --report --archive=2020-09-07_135109
要检查 xDoctor 报告,请执行以下作:
sudo xdoctor --report --archive=2022-xxxxxxxxxxxxx -WEC
示例:
admin@node1:~> sudo xdoctor --report --archive=2XXX-0X-01_0XXXX9 -WEC

xDoctor 4.8-105.0
CKM00xxxxxxxxx - ECS 3.8 Patch 2 (??) - 3.8.0.2 

Displaying xDoctor Report (2XX0-0X-01_0XXXX9) Filter:['CRITICAL', 'ERROR', 'WARNING'] ...

Timestamp    = 2XX0-0X-01_0XXXX9
    Category = Health
    Source   = Disk
    Severity = ERROR
    Node     = 169.254.1.1
    Message  = Boot device is not accessible
    Extra    = {'1XX.2XX.X.5': ['/dev/sdl']}
    RAP      = RAP004
    Solution = 46306
结果可能会生成建议的行动计划 (RAP) 代码。这些信息对于搜索可能与警报代码相关的链接知识库文章非常有用。

Affected Products

ECS Appliance

Products

ECS Appliance, ECS Appliance Hardware Gen1 U-Series, ECS Appliance Software with Encryption, ECS Appliance Software without Encryption
Article Properties
Article Number: 000021704
Article Type: How To
Last Modified: 07 نوفمبر 2025
Version:  14
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.