PowerFlex SDS Process Continuously Panics At Function MosAsyncIO_ReqAccounting
Summary: During a manual or PFMP SVM conversion and manual upgrade of the PowerFlex component, the SDS might fail continuously if the rep_tgt.txt file is not properly created.
Symptoms
The SDS process continuously panics with the following stack trace:
2024/05/27 08:11:10.051615 Panic in file /data/build/workspace/ScaleIO-Common-Job/src/mos/usr/linux/mos_async_io.c, line 1107, function mosAsyncIO_ReqAccounting, PID 21157.Panic Expression pOsReq->accounting.totalLenSubmittedBytes == pReq->bytesIO /opt/emc/scaleio/sds/bin/sds-4.5.2000.135(mosDbg_PanicPrepare+0xf4) [0x936f74] /opt/emc/scaleio/sds/bin/sds-4.5.2000.135(mosAsyncIO_ReqAccounting+0x26b) [0x95398b] /opt/emc/scaleio/sds/bin/sds-4.5.2000.135() [0x953b4e] /opt/emc/scaleio/sds/bin/sds-4.5.2000.135(mosAsyncIO_Reaper+0xab8) [0x959dc8] /opt/emc/scaleio/sds/bin/sds-4.5.2000.135(mosOsThrd_StartFunc+0x15a) [0x94056a] /lib64/libpthread.so.0(+0xa6ea) [0x7f0629c166ea] [(nil)]
Before the SDS panic, the SDS trace logs indicate an issue while reading from /opt/emc/scaleio/sds/cfg/rep_tgt.txt:
2024/05/27 08:10:36.501247 LOW:7fa41442ddb0:mos_ReadParamFromSysPath:01442: ERROR: Failed to stat sys file /sys/dev/block/0:55/partition, errno: 2 2024/05/27 08:10:36.501253 MED:7fa41442ddb0:mos_GetDevMaxIoSizeBytesFromFD:01565: Could not read parameter for file 28 (path /sys/dev/block/0:55/partition), assuming 256. 2024/05/27 08:10:36.501260 MED:7fa41442ddb0:mosAsyncIO_OpenFileEx:00463: Opened file /opt/emc/scaleio/sds/cfg/rep_tgt.txt (fd 28), maxInflight 8, maxIoSize 256, ptr 0x7fa42c14a450
Impact
Simultaneous panics in multiple SDS nodes could lead to Data Unavailable events.
Cause
SDS has an issue while attempting to access the rep_tgt.txt file. The daemon expects the file to be of size 4096 bytes on SLES OS.
Resolution
If rep_tgt.txt Does Not Exist:
-
-
-
Retrieve the required IDs from the Primary MDM:
- Get MDM_ID:
scli --query_all | grep ID | head -n1 - Get SDS_ID: (See the SDS in question)
scli --query_all_sds
- Get MDM_ID:
-
Create the
rep_tgt.txtfile with the retrieved IDs. Replace MDM_ID and SDS_ID from the above outputecho -n "mdmId=<MDM_ID>,tgtId=<SDS_ID>" > /opt/emc/scaleio/sds/cfg/rep_tgt.txt truncate -s 4096 /opt/emc/scaleio/sds/cfg/rep_tgt.txtExample:
echo -n "mdmId=e7db67b7c2e2190f,tgtId=2514c01a00000003" > /opt/emc/scaleio/sds/cfg/rep_tgt.txt truncate -s 4096 /opt/emc/scaleio/sds/cfg/rep_tgt.txt -
Start the SDS daemon:
/opt/emc/scaleio/sds/bin/create_service.sh
-
-
If rep_tgt.txt Exists:
-
-
Verify that the correct MDM ID and SDS ID is inside the file.
cat /opt/emc/scaleio/sds/cfg/rep_tgt.txt -
Check the file size (must be 4096 bytes):
ls -l /opt/emc/scaleio/sds/cfg/rep_tgt.txt -
If the file is smaller than 4096 bytes:
- Backup the existing file:
cp /opt/emc/scaleio/sds/cfg/rep_tgt.txt /opt/emc/scaleio/sds/cfg/rep_tgt.txt.bak - Resize the file:
truncate -s 4096 /opt/emc/scaleio/sds/cfg/rep_tgt.txt - Verify the new file size
- Backup the existing file:
- Start the SDS daemon:
/opt/emc/scaleio/sds/bin/create_service.sh
-
Impacted Version
PowerFlex 3.x
PowerFlex 4.x