VPLEX : VPLEX cluster shutdown script hung
Summary: This article talks about the resolution steps when the procedure to shut down vplex cluster script gets hung.
Symptoms
In a situation where user is performing VPlex cluster shutdown (Metro/Local), the shutdown script gets hung on either of the cluster (example: cluster-1) and show's below output:
VPlexcli:/> cluster shutdown --cluster cluster-1
Warning: Shutting down a VPlex cluster may cause data unavailability. Please refer to the VPlex documentation for the recommended procedure for shutting down a cluster. To show that you understand the impact, enter 'shutdown': shutdown
You have chosen to shutdown 'cluster-1'. To confirm, enter 'cluster-1': cluster-1
Status Description
------------ -----------------------------
In progress. Shutdown already in progress.
NOTE: In this scenario, user ran shutdown script on Cluster-1 and waited for 3-5 minutes (recommended) for shutdown to complete. However, user finds that script is hung and not getting completed.
Type the following command to display the cluster summary in which cluster-1 still shows connected as "true" as follows:
VPlexcli:/> cluster summary
Clusters:
Name Cluster ID TLA Connected Expelled Operational Status Health State
--------- ---------- -------------- --------- -------- ------------------ ------------
cluster-1 1 FNM0019xxxxxxx true false unknown minor-failure
cluster-2 2 FNM0018xxxxxxx true false ok ok
Islands:
Island ID Clusters
--------- --------------------
1 cluster-1, cluster-2
NOTE: In above scenario , user is trying to shutdown only cluster-1 not cluster-2 which is a part of Metro configuration.
Cause
Resolution
To resume the shutdown script on the affected cluster (example: cluster-1 as shown above), reboot the affected management server of the affected cluster (example: cluster-1 as shown above) by running the commands listed below and resume the shutdown procedure again. Rebooting the management server kills the running script and allows us to run the command again. Rebooting the management server will not make any impact it will just kill the existing running process.
1) Follow this KB article to reboot the affected management server (ex. cluster-1) as follows: 330568 : VPLEX: How to reboot the VPLEX management server.
2) Check if the affected management server is back online by re-logging into CLI session. Once confirmed, resume the shutdown script on the affected cluster (ex. cluster-1) as follows:
Login as: service
Using keyboard-interactive authentication.
Password:
service@ManagementServer:~> vplexcli
Trying ::1...
Connected to localhost.
Escape character is '^]'.
creating logfile:/var/log/VPlex/cli/session.log_service_localhost_T23474_20190XXXXXXXXX
VPlexcli:/> cluster shutdown --cluster cluster-1
Warning: Shutting down a VPlex cluster may cause data unavailability. Please refer to the VPlex documentation for the recommended procedure for shutting down a cluster. To show that you understand the impact, enter 'shutdown': shutdown
You have chosen to shutdown 'cluster-1'. To confirm, enter 'cluster-1': cluster-1
Status Description
------------ -----------------------------
In progress. Shutdown already in progress.
3) Wait for 3-5 minutes (recommended) for the shutdown to complete. Run the following command to check the cluster's status and confirm that "operational-status" for affected cluster (ex. cluster-1) is "not-running" state:
VPlexcli:/> cluster status
Cluster cluster-1
operational-status: not-running
transitioning-indications:
transitioning-progress:
health-state: unknown
health-indications:
local-com: failed to validate local-com: Firmware command
error.
communication error recently.
Cluster cluster-2
operational-status: degraded
transitioning-indications: suspended exports,suspended volumes
transitioning-progress:
health-state: minor-failure
health-indications: 37 suspended Devices
6 unhealthy Devices or storage-volumes
storage-volume unreachable
local-com: ok
4) Run the following command to display the cluster summary to check affected cluster (ex. cluster-1) "connected" is showing "false" and island is showing the running the cluster (ex.cluster-2) as follows:
VPlexcli:/> cluster summary
Clusters:
Name Cluster ID TLA Connected Expelled Operational Status Health State
--------- ---------- -------------- --------- -------- ------------------ ------------
cluster-1 1 FNM0019xxxxxxx false - - -
cluster-2 2 FNM0018xxxxxxx true false ok ok
Islands:
Island ID Clusters
--------- --------------------
1 cluster-2
5) If the issue still persists, follow these (Level 50 Internal) KB articles and check if it resolves the issue:
000093239
000027950
Additional Information
| https://downloads.dell.com/TranslatedPDF/AR-SA_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/DE_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/ES_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/ES-XL_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/FR_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/IT_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/JA_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/KO_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/NL_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/PT_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/PT-BR_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/RU_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/SV_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/ZH-CN_536538.pdf |
| https://downloads.dell.com/TranslatedPDF/ZH-TW_536538.pdf |