Start a Conversation

Unsolved

This post is more than 5 years old

886

February 16th, 2017 12:00

ScaleIO: Root Cause of "sds_receive_buffer_alloc_failures"

Have 3-node ScaleIO 2.0.0.2 cluster

Nodes are SDS1_S1 (MSM slave), SDS2_S2 (MDM Master) and SDS3_S3 (MDM Tie-Braker)

Node SDS3_S3 log shows:

sds_receive_buffer_alloc_failures:

        Short window:   43100 failures in 60 seconds (limit is 20000). Last failure: 2017-02-09 13:25:16

        Medium window:  509701 failures in 3600 seconds (limit is 200000). Last failure: 2017-02-09 13:25:16

What is the source of these failures and how does one correct and eliminate these errors?

Nodes SDS1-S1 and SDS2_S2 do not show any failures.

Thanks,

CCMC

Requested by pawelw -> Check MDM and SDS Performance Parameters:

S1:~# scli --query_performance_parameters --sds_name SDS1_S1 --print_all

MDM Performance Parameters:

        Active profile: high_performance

                                                                Current Value    Profile Value

        MDM_NUMBER_SDC_RECEIVE_UMT                                     5                5

        MDM_NUMBER_SDS_RECEIVE_UMT                                    10               10

        MDM_NUMBER_SDS_SEND_UMT                                       10               10

        MDM_NUMBER_SDS_KEEPALIVE_RECEIVE_UMT                          10               10

        MDM_SDS_CAPACITY_COUNTERS_UPDATE_INTERVAL                      1                1

        MDM_SDS_CAPACITY_COUNTERS_POLLING_INTERVAL                     5                5

        MDM_SDS_VOLUME_SIZE_POLLING_INTERVAL                          15               15

        MDM_SDS_VOLUME_SIZE_POLLING_RETRY_INTERVAL                     5                5

        MDM_NUMBER_SDS_TASKS_UMT                                    1024             1024

        MDM_INITIAL_SDS_SNAPSHOT_CAPACITY                           1024             1024

        MDM_SDS_SNAPSHOT_CAPACITY_CHUNK_SIZE                        5120             5120

        MDM_SDS_SNAPSHOT_USED_CAPACITY_THRESHOLD                      50               50

        MDM_SDS_SNAPSHOT_FREE_CAPACITY_THRESHOLD                     200              200

        MDM_NET_ALLOC_RCV_BUFFER_WAIT_MS                             500              500

        MDM_NET_BREAK_DO_IO_LOOP                                       5                5

SDS d62b204f00000002 Performance Parameters:

        Active profile: high_performance

                                                                Current Value    Profile Value

        MDM_NUMBER_SOCKETS_PER_SDS_IP                                  4                4

        MDM_SDS_KEEPALIVE_TIME                                      5000             5000

        SDS_NUMBER_NETWORK_UMT                                         8                8

        SDS_TCP_SEND_BUFFER_SIZE                                    4096             4096

        SDS_TCP_RECEIVE_BUFFER_SIZE                                 4096             4096

        SDS_MAX_NUMBER_ASYNCHRONOUS_IO_PER_DEVICE                    128              128

        SDS_NUMBER_SDC_IO_UMT                                        500              500

        SDS_NUMBER_SDS_IO_UMT                                        500              500

        SDS_NUMBER_SDS_COPY_IO_UMT                                   164              164

        SDS_NUMBER_COPY_UMT                                          164              164

        SDS_NUMBER_OS_THREADS                                          8                8

        SDS_NUMBER_SOCKETS_PER_SDS_IP                                  4                4

        SDS_NUMBER_IO_BUFFERS                                          5                5

        SDS_NET_ALLOC_RCV_BUFFER_WAIT_MS                            1000             1000

        SDS_NET_BREAK_DO_IO_LOOP                                       5                5

Same as above for all three MDM/SDS nodes.

306 Posts

February 21st, 2017 02:00

Hi CCMC,

Please check how many IO buffers you have configured on your SDS ("scli --query_sds --sds_id xxx") - if you have less that 5, please increase the number to five and see if the errors keep appearing.

Thank you,

Pawel

No Events found!

Top