Unsolved
This post is more than 5 years old
10 Posts
0
886
ScaleIO: Root Cause of "sds_receive_buffer_alloc_failures"
Have 3-node ScaleIO 2.0.0.2 cluster
Nodes are SDS1_S1 (MSM slave), SDS2_S2 (MDM Master) and SDS3_S3 (MDM Tie-Braker)
Node SDS3_S3 log shows:
sds_receive_buffer_alloc_failures:
Short window: 43100 failures in 60 seconds (limit is 20000). Last failure: 2017-02-09 13:25:16
Medium window: 509701 failures in 3600 seconds (limit is 200000). Last failure: 2017-02-09 13:25:16
What is the source of these failures and how does one correct and eliminate these errors?
Nodes SDS1-S1 and SDS2_S2 do not show any failures.
Thanks,
CCMC
Requested by pawelw -> Check MDM and SDS Performance Parameters:
S1:~# scli --query_performance_parameters --sds_name SDS1_S1 --print_all
MDM Performance Parameters:
Active profile: high_performance
Current Value Profile Value
MDM_NUMBER_SDC_RECEIVE_UMT 5 5
MDM_NUMBER_SDS_RECEIVE_UMT 10 10
MDM_NUMBER_SDS_SEND_UMT 10 10
MDM_NUMBER_SDS_KEEPALIVE_RECEIVE_UMT 10 10
MDM_SDS_CAPACITY_COUNTERS_UPDATE_INTERVAL 1 1
MDM_SDS_CAPACITY_COUNTERS_POLLING_INTERVAL 5 5
MDM_SDS_VOLUME_SIZE_POLLING_INTERVAL 15 15
MDM_SDS_VOLUME_SIZE_POLLING_RETRY_INTERVAL 5 5
MDM_NUMBER_SDS_TASKS_UMT 1024 1024
MDM_INITIAL_SDS_SNAPSHOT_CAPACITY 1024 1024
MDM_SDS_SNAPSHOT_CAPACITY_CHUNK_SIZE 5120 5120
MDM_SDS_SNAPSHOT_USED_CAPACITY_THRESHOLD 50 50
MDM_SDS_SNAPSHOT_FREE_CAPACITY_THRESHOLD 200 200
MDM_NET_ALLOC_RCV_BUFFER_WAIT_MS 500 500
MDM_NET_BREAK_DO_IO_LOOP 5 5
SDS d62b204f00000002 Performance Parameters:
Active profile: high_performance
Current Value Profile Value
MDM_NUMBER_SOCKETS_PER_SDS_IP 4 4
MDM_SDS_KEEPALIVE_TIME 5000 5000
SDS_NUMBER_NETWORK_UMT 8 8
SDS_TCP_SEND_BUFFER_SIZE 4096 4096
SDS_TCP_RECEIVE_BUFFER_SIZE 4096 4096
SDS_MAX_NUMBER_ASYNCHRONOUS_IO_PER_DEVICE 128 128
SDS_NUMBER_SDC_IO_UMT 500 500
SDS_NUMBER_SDS_IO_UMT 500 500
SDS_NUMBER_SDS_COPY_IO_UMT 164 164
SDS_NUMBER_COPY_UMT 164 164
SDS_NUMBER_OS_THREADS 8 8
SDS_NUMBER_SOCKETS_PER_SDS_IP 4 4
SDS_NUMBER_IO_BUFFERS 5 5
SDS_NET_ALLOC_RCV_BUFFER_WAIT_MS 1000 1000
SDS_NET_BREAK_DO_IO_LOOP 5 5
Same as above for all three MDM/SDS nodes.
pawelw1
306 Posts
0
February 21st, 2017 02:00
Hi CCMC,
Please check how many IO buffers you have configured on your SDS ("scli --query_sds --sds_id xxx") - if you have less that 5, please increase the number to five and see if the errors keep appearing.
Thank you,
Pawel