This post is more than 5 years old
1 Message
0
3007
Unable to remove a failed SDS
I've got a server that failed catastrophically that was working as an SDS. I want to remove it from my ScaleIO cluster, but cannot.
I tried to remove the SDS with scli, but now it is stuck in Remove-Pending state:
root@scaleio-1-1:~# scli --query_all_sds
Query-all-SDS returned 5 SDS nodes.
Protection Domain f62d8b9500000000 Name: scaleio-domain-1
SDS ID: ff1a1edf00000004 Name: scaleio-1-1 State: Connected, Joined IP: 192.168.34.22,192.168.35.22,192.168.36.23,192.168.37.23 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1ede00000003 Name: scaleio-2-1 State: Connected, Joined IP: 192.168.34.21,192.168.35.21,192.168.36.22,192.168.37.22 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1edd00000002 Name: scaleio-8-1 State: Remove-Pending, Disconnected, Decoupled IP: 192.168.34.28,192.168.35.28,192.168.36.24,192.168.37.24 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1edc00000001 Name: scaleio-7-1 State: Connected, Joined IP: 192.168.34.27,192.168.35.27,192.168.36.20,192.168.37.20 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1edb00000000 Name: scaleio-0-1 State: Connected, Joined IP: 192.168.34.20,192.168.35.20,192.168.36.21,192.168.37.21 Port: 7072 Version: 2.0.14000
It was not possible to remove it as it was being done already.
root@scaleio-1-1:~# scli --remove_sds --sds_name scaleio-8-1
Error: MDM failed command. Status: SDS is being removed
root@scaleio-1-1:~# scli --remove_sds --sds_name scaleio-8-1 --force
Removing the an SDS might leave some data unprotected in case of failure. Press 'y' and then Enter to confirm: y
Error: MDM failed command. Status: SDS is being removed
Also tried to clear all errors, but that did not change anything even if the command was accepted:
root@scaleio-1-1:~# scli --clear_sds_device_error --sds_name scaleio-8-1 --clear_all
Successfully cleared all SDS scaleio-8-1 devices
It was possible to enter maintenance mode:
root@scaleio-1-1:~# scli --enter_maintenance_mode --sds_name scaleio-8-1
Set Maintenance Mode Results:
SDS scaleio-8-1: Success
root@scaleio-1-1:~# scli --query_all_sds
Query-all-SDS returned 5 SDS nodes.
Protection Domain f62d8b9500000000 Name: scaleio-domain-1
SDS ID: ff1a1edf00000004 Name: scaleio-1-1 State: Connected, Joined IP: 192.168.34.22,192.168.35.22,192.168.36.23,192.168.37.23 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1ede00000003 Name: scaleio-2-1 State: Connected, Joined IP: 192.168.34.21,192.168.35.21,192.168.36.22,192.168.37.22 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1edd00000002 Name: scaleio-8-1 State: Remove-Pending, Disconnected, Decoupled IP: 192.168.34.28,192.168.35.28,192.168.36.24,192.168.37.24 Port: 7072 IN_MAINTENANCE Versi on: 2.0.14000
SDS ID: ff1a1edc00000001 Name: scaleio-7-1 State: Connected, Joined IP: 192.168.34.27,192.168.35.27,192.168.36.20,192.168.37.20 Port: 7072 Version: 2.0.14000
SDS ID: ff1a1edb00000000 Name: scaleio-0-1 State: Connected, Joined IP: 192.168.34.20,192.168.35.20,192.168.36.21,192.168.37.21 Port: 7072 Version: 2.0.14000
However, that changed nothing and now I cannot even exit maintenance mode as my SDS is not operational.
root@scaleio-1-1:~# scli --exit_maintenance_mode --sds_name scaleio-8-1
Exit Maintenance Mode Results:
SDS scaleio-8-1: At least one SDS is not in normal operational state. Please check that all SDSs in the Protection Domain are up, and running normally.
Error: MDM failed command. Status: At least one SDS is not in normal operational state. Please check that all SDSs in the Protection Domain are up, and running normally.
How can I make ScaleIO forget everything about that particular SDS?
pawelw1
306 Posts
0
April 18th, 2018 04:00
Hi,
There might be many reasons for this behavior, you can try to follow this KB: link and see if it helps; also try to switch the MDM ownership. If that won't work, best if you open a Service Request so we can have a look at SDS and MDM logs.
Cheers,
Pawel