PowerFlex: Cloudlink-encrypted devices marked error after SDS reboots
Summary: Some Cloudlink-encrypted SDS devices marked error after SDS node reboots.
Symptoms
After SDS reboots, Some Cloudlink-encrypted devices remain to function while the rest are marked as Error.
Multiple SDS devices marked as "Error" state in the SDS that has rebooted. This may cause reduced capacity and potentially a longer rebuild time.
In SDS trc:
21/06 13:09:34.917236 0x7fed52ebbeb0:mosAsyncIO_OpenFileEx:00376: WARNING: Failed to open IO file /dev/mapper/svm_sdg with rc 3 21/06 13:09:34.917238 0x7fed52ebbeb0:file_OpenEx:00707: Open error /dev/mapper/svm_sdg, NOT_FOUND 21/06 13:09:34.917241 0x7fed52ebbeb0:phyDev_ReadDevId:02649: failed to open file: path=/dev/mapper/svm_sdg, NOT_FOUND
The SDS's cfg/partitions file includes one or more entries with the SDS devices prefixed with mapper/svm_sd and one or more with no prefix.
The following example shows both types of entries.
8 96 1875374424 mapper/svm_sdg 8 112 1875374424 sdh The device errors can be cleared after the affected SDS has been restarted.
The ScaleIO version in use is higher than 2.0.1.3 and KB 000158993 does not apply, although the symptom is similar.
Cause
CloudLink has issues with disk check, and only unlocks some devices before starting SDS. When SDS starts, the /dev/mapper/svm_sd* device files for those disks yet to be unlocked are not created yet, thus reporting SDS device errors.
Resolution
This issue is fixed in Cloud link 6.8. Consult the support matrix to ensure that the environment is still supported.
To recover from this issue, manually restart the SDS service and clear the affected SDS device errors.
To prevent the issue from occurring, start the SDS after CloudLink has unlocked all the disks:
- Before shutting down the node, run
/opt/emc/scaleio/sds/bin/delete_service.sh
After reboot, wait until all the disks used as SDS devices are unlocked ("svm status" or the CloudLink control center will show) and then run "/opt/emc/scaleio/sds/bin/create_service.sh"; or
OR
- Edit "
/opt/emc/extra/pre_run.sh," and insert "sleep 30" above the last line (consider adjusting this number 30 higher if the issue still occurs, i.e. not all devices are unlocked when the SDS process starts):
... sleep 30 echo pre_run returned...$(date) >> /var/log/svm-sds
Additional Information
NOTE: The sleep workaround above is not in an upgrade (to 6.7, as it does not include the permanent fix to this issue yet), and may need to be applied again.
This is not a ScaleIO software issue. The issue is in Cloud link 6.6 and 6.7 and is fixed in Cloud link 6.8.
Keep #CCTFY25Q4 as the keyword in all versions for tracking purposes.