ECS: OBS: xDoctor: RAP073/208: Se detectó una falla en la conexión del switch
Summary: En esta base de conocimientos, se explica cómo manejar la alerta de falla de conexión del switch detectada.
Symptoms
A partir de ECS xDoctor v4.8-109.0 y ObjectScale xDoctor v5.1-109.0, RAP208 (falla de conexión de switch detectada) se implementa como un corrector automático. Cuando los problemas de conectividad de los switches superan el umbral de gravedad Error o Crítico configurado, xDoctor genera una alerta RAP208 e inicia automáticamente su flujo de trabajo de orquestación de reparación integrada. Este flujo de trabajo realiza las acciones de corrección necesarias si se habilitan los autorcorrectores de xDoctor.
NOTA: Si en su entorno se ejecuta una versión de xDoctor anterior a ECS xDoctor v4.8-109.0 u ObjectScale xDoctor v5.1-109.0, la funcionalidad de autorreparación de RAP208 no está disponible. En estas versiones, la corrección se debe realizar mediante el proceso de Autopilot que se describe a continuación o siguiendo los pasos de corrección manual que se describen en la sección Resolución.
Alerta que activa la autorreparación de RAP208
El flujo de trabajo de autorreparación de RAP208 se activa cuando las fallas de conectividad de los switches superan el umbral de gravedad Error o Critical configurado. Una vez que se supera este umbral, xDoctor genera una alerta RAP208, que sirve como activador para el proceso de reparación automatizada.
Ejemplo de salida de alerta
NOTA: En las versiones de xDoctor anteriores a ECS xDoctor v4.8-109.0 y ObjectScale xDoctor v5.1-109.0, esta condición solo genera una alerta. No se realiza una corrección automática.
--------------------------------------------------------
INFO - Auto Healer for dell_switch_connectivity disabled
--------------------------------------------------------
Extra = Auto Healer for dell_switch_connectivity disabled
Timestamp = 2026-04-01_180132
PSNT = CKMXXXXXXXXXXX @ 4.8-109.0
----------------------------------------------------
ERROR - (Cached) Switch Connection Failure detected.
----------------------------------------------------
Node = 169.254.1.1
Extra = {"169.254.1.1": ["hare"]}
RAP = RAP208
Solution = KB 39838
Timestamp = 2026-04-01_180132
PSNT = CKMXXXXXXXXXXX @ 4.8-109.0
Corrección de curador automático (ejemplo)
Cuando se habilitan las autorreparaciones, xDoctor inicia automáticamente la corrección de los problemas de conectividad de switch detectados mediante la aplicación de acciones correctivas comunes que se describen en esta base de conocimientos.
--------------------------------------------------------
FIXED - Auto Healer fixed Dell switch connectivity issue
--------------------------------------------------------
Node = Nodes
Extra = {"Nodes": ["169.254.1.1"]}
Timestamp = 2026-04-01_180344
PSNT = CKMXXXXXXXXXXX @ 4.8-109.0
Requisito de Auto Healing
Para que se lleve a cabo esta corrección, se debe habilitar la característica de reparación automática de xDoctor. Los correctores automáticos se pueden habilitar durante la instalación o después de la instalación siguiendo los pasos descritos en:
Base de conocimientos: ECS: xDoctor: Cómo habilitar xDoctor Auto Healer después de la instalación de la herramienta
Cause
Después del reemplazo de un switch, es posible que cambien las claves de host SSH utilizadas para autenticarse en el switch o que la interfaz de administración que se conecta al switch se apague administrativamente. En ocasiones, la contraseña configurada en xDoctor no coincide con la contraseña actual del switch afectado y se debe actualizar según corresponda.
Los flujos de trabajo de automatización y autorreparación de xDoctor no realizan la corrección de contraseñas de switch. En su lugar, xDoctor detecta fallas relacionadas con la autenticación y genera la alerta adecuada, lo que dirige al usuario al artículo de la base de conocimientos pertinente que describe cómo configurar xDoctor para utilizar la contraseña configurada en los switches.
Resolution
Sanador automático xDoctor: ObjectScale xDoctor v5.1-109.0/ECS xDoctor v4.8-109.0 o posterior
- Para activar manualmente la autorreparación habilitada, ejecute el siguiente comando en el
master.racknodo. Esto inicia los analizadores de rack, que validarán y repararán automáticamente los nodos uno a la vez.
# sudo xdoctor --rap=RAP208
Ejemplo:
admin@ecsnode1:~> sudo xdoctor --rap=RAP208 2026-04-01 18:03:45,441: xDoctor_4.8-109.0 - INFO : Initializing xDoctor v4.8-109.0 ... [... Truncated Output ...] 2026-04-01 18:05:01,725: xDoctor_4.8-109.0 - INFO : ANALYZER [ac_dell_switch_connectivity] 2026-04-01 18:05:02,063: xDoctor_4.8-109.0 - INFO : Autohealing switch_connectivity on node 169.254.1.1 ... 2026-04-01 18:08:57,494: xDoctor_4.8-109.0 - INFO : All data analyzed in 0:03:55 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : -------------------- 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : Diagnosis Summary 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : -------------------- 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : PSNT: CKMXXXXXXXXXXX 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : -------------------- 2026-04-01 18:08:58,529: xDoctor_4.8-109.0 - INFO : FIXED = 1 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : CRITICAL = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : CRITICAL (CACHED) = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : ERROR = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : ERROR (CACHED) = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : WARNING = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : INFO = 0 2026-04-01 18:08:58,530: xDoctor_4.8-109.0 - INFO : VERBOSE = 0 2026-04-01 18:08:58,531: xDoctor_4.8-109.0 - INFO : REPORT = 0 2026-04-01 18:08:58,646: xDoctor_4.8-109.0 - INFO : --------------------- 2026-04-01 18:08:58,646: xDoctor_4.8-109.0 - INFO : xDoctor Post Features 2026-04-01 18:08:58,646: xDoctor_4.8-109.0 - INFO : ---------------- 2026-04-01 18:08:58,646: xDoctor_4.8-109.0 - INFO : Data Combiner 2026-04-01 18:08:58,646: xDoctor_4.8-109.0 - INFO : ------------- 2026-04-01 18:08:58,647: xDoctor_4.8-109.0 - INFO : Created a Data Collection Report (data.xml) 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : ------ 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : SysLog 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : ------ 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : Using Fabric as Syslog Server 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : Not triggered ... no WARNING, ERROR, nor CRITICAL 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : ---- 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : SNMP 2026-04-01 18:08:58,648: xDoctor_4.8-109.0 - INFO : ---- 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - INFO : Using 10.118.165.48:162 as SNMP server 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - INFO : Not triggered .. no WARNING, ERROR nor CRITICAL 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - INFO : ------------ 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - INFO : ProcComplete 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - INFO : ------------ 2026-04-01 18:08:58,649: xDoctor_4.8-109.0 - WARNING : ProcComplete is disabled, please re-enable it (xdoctor --config) 2026-04-01 18:08:58,767: xDoctor_4.8-109.0 - INFO : ---------------- 2026-04-01 18:08:58,767: xDoctor_4.8-109.0 - INFO : Session Archiver 2026-04-01 18:08:58,768: xDoctor_4.8-109.0 - INFO : ---------------- 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : Session Stored in folder - /usr/local/xdoctor/archive/other/2026-04-01_180344 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : Session Archived as tar - /usr/local/xdoctor/archive/other/xDoctor-CKMXXXXXXXXXXX-2026-04-01_180344.tgz 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : -------------------------- 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : Session Report - sudo xdoctor --report --archive=2026-04-01_180344 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : --------------- 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : Session Cleaner 2026-04-01 18:08:58,777: xDoctor_4.8-109.0 - INFO : --------------- 2026-04-01 18:08:58,789: xDoctor_4.8-109.0 - INFO : Removing folder (count limit) - /usr/local/xdoctor/archive/other/2026-04-01_170120 2026-04-01 18:08:58,790: xDoctor_4.8-109.0 - INFO : Removing archive (count limit) - /usr/local/xdoctor/archive/other/xDoctor-CKMXXXXXXXXXXX-2026-04-01_170120.tgz 2026-04-01 18:08:58,793: xDoctor_4.8-109.0 - INFO : Cleaned 2 archived session(s) 2026-04-01 18:08:58,793: xDoctor_4.8-109.0 - INFO : ------- 2026-04-01 18:08:58,794: xDoctor_4.8-109.0 - INFO : Emailer 2026-04-01 18:08:58,794: xDoctor_4.8-109.0 - INFO : ------- 2026-04-01 18:08:58,794: xDoctor_4.8-109.0 - INFO : Using Dedicated Server (25:25) as SMTP Server ... 2026-04-01 18:08:58,794: xDoctor_4.8-109.0 - INFO : Email Type = Individual Events 2026-04-01 18:08:58,795: xDoctor_4.8-109.0 - INFO : ------------------------------ 2026-04-01 18:08:58,795: xDoctor_4.8-109.0 - INFO : xDoctor session_1775066624.943 finished in 0:05:13 2026-04-01 18:08:58,813: xDoctor_4.8-109.0 - INFO : Successful Job:1775066624 Exit Code:192
- Ejecute el informe de sesión para revisar los resultados de la ejecución de la autorreparación manual.
# sudo xdoctor --report --archive=<session report>
Ejemplo:
admin@ecsnode1:~> sudo xdoctor --report --archive=2026-04-01_180344
xDoctor 4.8-109.0
CKMXXXXXXXXXXX - ECS 3.8.1.4
Displaying xDoctor Report (2026-04-01_180344) Filter:[] ...
--------------------------------------------------------
FIXED - Auto Healer fixed Dell switch connectivity issue
--------------------------------------------------------
Node = Nodes
Extra = {"Nodes": ["169.254.1.1"]}
Timestamp = 2026-04-01_180344
PSNT = CKMXXXXXXXXXXX @ 4.8-109.0
- En caso de que se produzca una falla, abra un SRn para investigarlo.
Ejemplo de falla:---------------------------------------------------- ERROR - (Cached) Auto fix failed - Switch Connection Failure detected. ---------------------------------------------------- Node = 169.254.1.1 Extra = {"169.254.1.1": ["hare"]} RAP = RAP208 Solution = KB 39838 Timestamp = 2026-04-01_180132 PSNT = CKMXXXXXXXXXXX @ 4.8-109.0
Piloto automático de xDoctor:
Esta base de conocimientos (KB) ahora está automatizada con xDoctor Auto Pilot, abordando la mayoría de los problemas sin necesidad de participación del soporte.
Esta característica es nativa de xDoctor 4-8.104.0 y superior, para problemas de sintaxis y uso , consulte ECS: ObjectScale: Cómo ejecutar los scripts de automatización de KB (piloto automático).
Para encontrar el nodo maestro del rack:
Comando:
ssh master.rack
Para encontrar la IP de NAN, puede utilizar la IP identificada en la alerta o desde getrackinfo:
Comando:
admin@ecsnode1:~> getrackinfo
Node private Node Public BMC
Ip Address Id Status Mac Ip Address Mac Ip Address Private.4(NAN) Node Name
=============== ====== ====== ================= =============== ================= =============== =============== =========
192.168.219.1 1 MA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.101 169.254.1.1 provo-red
192.168.219.2 2 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.102 169.254.1.2 sandy-red
192.168.219.3 3 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.103 169.254.1.3 orem-red
192.168.219.4 4 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.104 169.254.1.4 ogden-red
192.168.219.5 5 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.105 169.254.1.5 layton-red
192.168.219.6 6 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.106 169.254.1.6 logan-red
192.168.219.7 7 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.107 169.254.1.7 lehi-red
192.168.219.8 8 SA 00:00:00:00:00 0.0.0.0 00:00:00:00:00 192.168.219.108 169.254.1.8 murray-red
- Ejecute el comando de automatización desde el nodo maestro con xDoctor 4-8.104.0 y superior.
Nota:
--target-rack es compatible con esta acción.
# sudo xdoctor autopilot --kb 39838 --target-rack <rack_colour>
admin@ecsnode1:~> sudo xdoctor autopilot --kb 39838 --target-rack red
Checking for existing screen sessions...
Starting screen session 'autopilot_kb_39838_20250626_112318'...
Screen session 'autopilot_kb_39838_20250626_112318' started successfully.
Attaching to screen session 'autopilot_kb_39838_20250626_112318'...
Using /etc/ansible/ansible.cfg as config file
VERSION: 3.0
Playbook tasks: 47
Role tasks: 97
Total tasks: 144 across 1 host(s)
PLAY [red] ******************************************************************************************************************************************************************
Detected 8 hosts for this play.
TASK [target_check : set_fact] **********************************************************************************************************************************************
ok: [169.254.1.1 -> localhost] => {"ansible_facts": {"allowed_targets": "Please use: --target-rack", "target_node_check": false, "target_rack_check": true, "target_vdc_check": false}, "changed": false}
TASK [target_check : context] ***********************************************************************************************************************************************
skipping: [169.254.1.1] => {"changed": false, "false_condition": "node_script == false and target_node_check == true or rack_script == false and target_rack_check == true or vdc_script == false and target_vdc_check == true", "skip_reason": "Conditional result was False"}
...truncated
- Resumen de la revisión:
Ejemplo:
TASK [Print all summaries] **************************************************************************************************************************************************
ok: [169.254.1.1] => {
"msg": [
"*******************************************************************************",
"Switch xDoctor 'RAP073' password and SSH summary:",
"*******************************************************************************",
"Validated Frontend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838.",
"Validated Backend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838.",
"Validated Backend management connections: PASS: Management connections are up and connected to the frontend switches.",
"*******************************************************************************",
"Validated ssh keys to switch(es): PASS: All ssh keys are valid and nothing was corrected.",
"Validated xDoctor alert: PASS: Alert RAP073 was not present in xDoctor.",
"*******************************************************************************"
]
}
TASK [Set fact for context] *************************************************************************************************************************************************
ok: [169.254.1.1 -> localhost] => {"ansible_facts": {"context": " Validated Frontend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838., Validated Backend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838."}, "changed": false}
TASK [Fail if validation fails] *********************************************************************************************************************************************
fatal: [169.254.1.1]: FAILED! => {"changed": false, "msg": "Review the summary above for recommendations."}
NO MORE HOSTS LEFT **********************************************************************************************************************************************************
PLAY RECAP ******************************************************************************************************************************************************************
169.254.1.1 : ok=65 changed=13 unreachable=0 failed=1 skipped=73 rescued=0 ignored=1
169.254.1.2 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.3 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.4 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.5 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.6 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.7 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
169.254.1.8 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
=============================================================================================================================================================================
Status: FAIL
Time Elapsed: 0h 1m 25s
Debug log: /tmp/autopilot/log/autopilot_39838_20250626_113201.log
Message: Validated Frontend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838., Validated Backend switch(es): FAIL: The passwords for the Dell managed switch(es) are incorrect and need to be configured in the xDoctor settings according to KB 39838.
=============================================================================================================================================================================
- Actualice la contraseña de xDoctor:
admin@ecsnode7:~> sudo xdoctor -c --expert
xDoctor Configuration Menu
--------------------------
[Expert Mode Active]
(1) Overview
(2) Scheduling
(3) Archiving
(5) Repository
(9) Miscellaneous
(0) Exit
Please make a choice: 9
xDoctor Miscellaneous
---------------------
(3) Switches
(4) Remove Hardware Alerting Timestamp
(0) Main menu
Please make a choice: 3
xDoctor Switch Settings
---------------------
Enable Switch Analysis? [Yes]:
Switches [hare,rabbit,fox,hound]:
Username [admin]:
Password [*****]:
[New Switch Settings]
Enabled = Yes
Switches = hare,rabbit,fox,hound
Username = admin
Password = *****
> Issue new settings? [No]: yes
2024-11-20 16:03:53,702: xDoctor_4.8-100.0 - INFO : Settings saved and distributed ...
xDoctor Miscellaneous
---------------------
(3) Switches
(4) Remove Hardware Alerting Timestamp
(0) Main menu
Automatización básica de la base de conocimientos:
ECS: xDoctor: RAP073: Se detectó una falla en la conexión del switch
Consolidación adicional de KB en esta automatización:
ECS: xDoctor informa una falla en la conexión del switch debido a una clave RSA en known_hosts