Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

2241

October 13th, 2016 02:00

Issues in case of SDC Only IPs network down

Hi,

I have ScaleIO version 2.0.0.2 with MDM 3 cluster, 4 SDS nodes and the 3 networks as below.

   MDM IPs

   SDS - SDS Only IPs

   SDS - SDC Only IPs

A volume is mapped to an SDC node and I/O between the SDC and SDSs is fine.

During some tests, on a SDS node I ifdowned the interface for SDS - SDC network, there seems problems.

- No alerts generated:

     In case of SDS - SDS Only IPs, a network down is alerted at once and rebuilding starts.

     But SDS - SDC Only IPs network down is not alerted and no actions starts.

     On the SDC node TCP connection to the node down repeats SYN_SENT (The other conns to the nodes alive are ESTABLISHED)

     On the SDS node logs are generated in trc.0, but it just lasts for a minute.

          The logs reports Oscillation of type 3(RCV_KA_DISCONNECT) and Oscillation of type 3(SOCKET_DOWN)

     On the MDM node no logs are found telling of the error.

- I/O degraded or failed

     Only a SDS node out of 4 disconnection causes I/O performance degraded regardless of whichever SDS node.

     dd command takes in avarage over 10 times as long time to finish (and ends up in an error once in very often).

     In case of 3 SDS node out of 4 discconnection, the SDC node finds the block device but can not perform fdisk command.

I'd like to know if there is any fail-safe and alerts action to set up. (Does Oscillation of type tell anything?)


Best regards,

Seiji

306 Posts

October 14th, 2016 08:00

Hi Seiji,

Per ScaleIO 2.0.1 Release Notes (please check out SCI-10915) this is a known issue, please see below:

When SDS ports are defined as SDC ONLY, and connectivity problems are encountered between SDC and SDS, the SDCs might issue IO errors. The MDM does not have any indication that the "SDC ONLY" network is down since it does not monitor SDS to SDC connectivity.

This behavior might change in the next ScaleIO versions, but for the time being this the way it behaves. Feel free to send a feature/enhancement request :-)

Cheers,

Pawel

14 Posts

October 15th, 2016 03:00

Hi Pawel,

Thank you for the clear explanation.

I might set up a monitoring tool to have a notification sent from the SDS logs for this behavior.

looking forward to a fix in the next version.

Cheers,

Seiji

No Events found!

Top