Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

1666

August 17th, 2017 04:00

Disk failure simulation for ScaleIO cluster

The correct way of testing a disk failure would generally not be to remove a disk from its slot.

Pulling a working drive and replacing it is a basic hotplug event that ScaleIO treats as such, that is, it does not simulate drive failure to the system, beyond triggering a rebuild while the disk is "gone."

When the drive is plugged back in, first the raid controller sees it's meta-data on the disk, that this is the only member of a raid 0 set, and as a raid 0 device is the only source of consistency for its own data. The RAID controller trusts the disk to the data it says it has.

That logical disk is passed up to ScaleIO, which recognizes the disk, and the data on it, and puts it back as the same device it was previously.

To achieve the goal of documenting what the system does when it sees a new disk after a failure, the disk needs to be wiped once it is removed.

One method to achieve this would be to connect the drive to another system, where it can be viewed as a physical disk, and zero the entire device. A raid controller in HBA/passthrough/IT mode can be used for this, as well as SATA controllers that are compatible with the disk.

Zeroing the disk is a much better simulation of disk replacement. A new raid device may still present the same meta-data as far as ScaleIO is concerned, and thus the documentation from the replacement process will be different than when an actually blank new disk is installed.

7 Posts

August 18th, 2017 08:00

"The correct way of testing a disk failure would generally not be to remove a disk from its slot."


How do you test the failure without removing the drive? Is there a command to mark a drive offline?

August 23rd, 2017 08:00

If you wanted to mark a drive offline, you could run the command below against the drive that you would like to mark offline:

echo offline > /sys/block/sdx/device/state

To bring it back online you can run this command:

echo running > /sys/block/sdx/device/state

No Events found!

Top