Start a Conversation

Unsolved

This post is more than 5 years old

9763

June 15th, 2015 09:00

How to forcefully remove a sds or sds device?

Greetings,

I'm trying to remove a sds device. This is on a vmware scaleio install. I may have physically removed the hdd before letting scaleio finish removing the sds device earlier. Now I'm not able to remove this via the scli. I've tried the following command and it gets stuck in removing the errored device.

scli --remove_sds_device --sds_id --device_id --force

error.JPG.jpg

is there any way to forcefully delete this SDS from scaleio? I'm trying to start fresh on this one server. But it seems I can't remove the SDS without this device being removed first.

June 21st, 2015 00:00

1. Try clearing the device error 1st using the following command:
scli --clear_sds_device_error --sds_ip [SDS_IP] --clear_all

2. Run the remove_sds_device command again (with/without force flag)

June 22nd, 2015 11:00

What is the exact command you are running and what is the exact error message?

21 Posts

June 22nd, 2015 11:00

Yes, the error-ed drive is in a storage pool. There are currently mapped volumes in this storage pool, so it has many other sds devices in the pool.

21 Posts

June 22nd, 2015 11:00

No luck, I tried clearing the error in the GUI as well. It just goes back to an error state right away and I cannot remove the drive still. Maybe the only solution is to wipe everything and start fresh

June 22nd, 2015 11:00

If this was added to a storage pool, was any volume created from this Storage pool that was not deleted or any snapshot created that was not deleted and you are trying to delete the last disk from the Storage pool?

21 Posts

June 22nd, 2015 11:00

Yes it is.

June 22nd, 2015 11:00

Is this the last device in the Storage pool that you want to remove?

21 Posts

June 22nd, 2015 11:00

remove_pending.JPG.jpg

I am trying to remove a error-ed sds device. There's no error after issuing the removal command, it's just stuck at "Remove Pending" state. The drive has already been physically removed from the server and unmapped in VMware.

June 22nd, 2015 11:00

Note: If the capacity of this SDS is still used by volumes, and cannot be replaced, the command will fail.

60 Posts

June 22nd, 2015 13:00

Allen,

Before trying the command Tomer has advised, did you run abort_remove_sds_device?

21 Posts

June 22nd, 2015 14:00

Yes, I did. After aborting sds, and clearing the error, it still becomes an error-ed device. I tried adding another device in order to removed the error-ed one but it's saying it needs at least one sds device?

1 Attachment

June 23rd, 2015 05:00

Few more Qs:

1. How many SDSs did you have in your setup? What was the capacity of each SDS?

2. Is that the sequence of operation that you did that got U into this state?

a. Perform remove SDS device (from which interface? CLI/GUI/vSphere plugin) - BTW, was that the last SDS device at that time? if that it the case and there were volumes on that SDS it should not have allowed to remove the last SDS device.

b. U did not wait for the rebalance to complete and removed the hdd? got a device error alert

Can you please send us the ShowEvents printout from both MDMs?

go to /opt/emc/scaleio/mdm/bin and run the ./showevents.py script (for both MDMs)

Can you send a picture of the GUI dashboard - how much spare capacity do you have? was the spare configured properly?

Thanks,

Tomer

21 Posts

June 23rd, 2015 07:00

1. 6 SDS's. There's 4 servers with 10TB and 2 servers with 16TB. 2. Yes, I issued the remove command in the scli to remove 6 drives, 2 from 3 SDS’s. The drives were probably removed in an hr or 2 after issuing the command. Now I wish to remove the 3rd SDS entirely. So in vcenter, I removed all the Mapped Raw LUN’s to this SVM. 2a. From the CLI. I realize now that it would've been safer to remove in vSphere plugin. It was not the last SDS device at the time. 2b. That's correct; I issued the command and didn't watch the rebalance to see that everything was completed. I thought the removal command would be quick and hoped the data would be able to recover from the parity bits from the other SDS's. It looks like there was around 5.1GB of data that hadn't been rebalanced yet before the drive had been physically removed. 3. Sure, please see attachment.

1 Attachment

June 23rd, 2015 08:00

How many PD (protection domains) + SP (Storage pools) do you have?


Can you please supply also the following:

1. Screenshots of the GUI BackEnd view + Alerts view?

2. MDM (both) + relevant SDS logs -> you can run the getInfo under /opt/emc/scaleio/mdm/diag (same for SDS), it will collect all logs on the host on which you run it.

If you have a SIO-GW installed, you can also login to the IM-Web and from the Maintain view run the GetInfo which will collect all logs from all your SIO system nodes.


Did you try to perform SDS / MDM restart since U got into this state and check if it clears our the problem?
pkill sds / mdm

or

under /opt/emc/scaleio/sds/bin/ -> delete_service -> create_service

same for MDM (note: this will trigger a switch-over)

21 Posts

June 24th, 2015 06:00

There's 1 pd and 3 storage pools.

Please check your email for the screenshots and logs.

I did try restarting the SVM for the 3rd SDS, no luck.

Thanks!

No Events found!

Top