PowerFlex: adding device Error in trace logs Time budget exceeded" and IO_HARD_ERROR
Summary: Problem adding new devices in VxFlex OS (ScaleIO). Error in trace logs says "Time budget exceeded" and IO_HARD_ERROR
Symptoms
Trying to add new device on an SDS to the VxFlex OS
Customer is trying to add the new device to the SDS and gets error
Following are the errors in the MDM Trace Logs while adding the device:-
04/02 07:44:39.847673 0x7fe714c7deb0:mosEventLog_PostInternal:00590: New event added. Message: "Command add_sds_device received, User: 'USERNAME'. [464880250]". Additional info: "SDS: Name: SDSNAME, SDS Device: name N/A path /dev/sdab, Storage Pool: sas_10k_4, Test Mode: 'Test and activate' Test Time: 0 force_device_takeover flag: Specified" Severity: Info
04/02 07:44:44.435265 0x7fe714c7deb0:mosEventLog_PostInternal:00590: New event added. Message: "Command add_sds_device was not successful. Error code: Add SDS device IO error [464880250]". Additional info: "ID: 0000000000000000" Severity: Warning
As per SDS Trace logs we see the following errors:-
04/02 07:44:43.589785 0x7f6b739aaeb0:AddDevProgress_Trace:00255: AddDevProgress: osThreadId=11cd, devId=3fefb6ae00110006, Start=25f2ee8825 [usec], end=25f2ee9888 [usec], interval= 4195 [usec], Desc=Write PhyDevCombArr
04/02 07:44:44.380440 0x7f6b739aaeb0:file_DoIOSyncEx:02237: Time budget exceeded on reading from /dev/sdab
04/02 07:44:44.380569 0x7f6b739aaeb0:file_ReadIntern:01169: Read error from disk 1548199195, 8 IO_HARD_ERROR
04/02 07:44:44.380653 0x7f6b739aaeb0:file_DoIOSyncEx:02237: Time budget exceeded on reading from /dev/sdab
04/02 07:44:44.380697 0x7f6b739aaeb0:file_ReadIntern:01169: Read error from disk 1548199195, 8 IO_HARD_ERROR
04/02 07:44:44.380777 0x7f6b739aaeb0:contDev_SendDeviceError:01761: Sending device error to MDM: DevId:3fefb6ae00110006 deviceName: /dev/sdab readError: TRUE WriteError: FALSE
04/02 07:44:44.415846 0x7f6b739aaeb0:file_WriteIntern:00937: Write error to disk 64, 2 IO_HARD_ERROR
04/02 07:44:44.415978 0x7f6b739aaeb0:file_WriteIntern:00937: Write error to disk 64, 2 IO_HARD_ERROR
04/02 07:44:44.416061 0x7f6b739aaeb0:contDev_SendDeviceError:01761: Sending device error to MDM: DevId:3fefb6ae00110006 deviceName: /dev/sdab readError: TRUE WriteError: TRUE
04/02 07:44:44.426672 0x7f6b739aaeb0:phyDev_InitWithBuf:02279: Io error
04/02 07:44:44.426839 0x7f6b739aaeb0:mosAsyncIO_CloseFileIntern:00503: Closing device path:/dev/sdab
Cause
Focusing on the "Time budget exceeded" events, it appears that this device is either too slow to be added or there are hardware issues that may either be the controller issue or a bad block on the above disk.
Part of the test phase when adding in a new device, SIO must be able to write ~200 MB of data in ~4.5 seconds. It may be because of the controller issues or bad blocks against the device, or the issue appears to be that the disk is too slow, when we see the error "Time budget exceeded" and IO_HARD_ERROR
Resolution
This is not a VxFlex OS issue. This is more of a hardware issue.
Hardware vendor needs to be engaged for Investigation.
1. Check if there are issues on the Raid Controller.
2. Check if there are issues on the Disk or if it is a slow disk.
3. If possible try to get another disk and add it. If the new disk is successfully added, that would mean the earlier disk had problems.
4. if multiple new disks are facing the same issue while adding to an SDS, check for the issues on the Raid Controller. Also make sure the Raid Controller Driver/FW are at supported versions.