PowerFlex: SDS decoupled Network Test
Summary: This article describes how to troubleshoot an SDS_Decoupled error received by the MDM server.
Symptoms
An SDS disconnect manifests itself with the following symptoms:
- Storage rebuilds / rebalance
- Reduced spare capacity
- Increased disk usage on remaining SDS systems
- Possible volume corruption
- Devices long io's or device issues
Look for the following errors in the MDM event logs:
7256 2017-07-13 14:20:12.410 SDS_DECOUPLED ERROR SDS: SDS_10.241.xxx.xxx (id: 3711776c00000003) decoupled.
Cause
Resolution
If you do scli query_all_sds, you may see disconnect messages.
Verify the connection between the MDM and the SDS nodes:
For connections where the SDS says it cannot communicate run telnet to port 7072 between the hosts. If you cannot connect, check to be sure that the SDS process is running on the target server. If it is, check the network and hosts for firewalls or other items that could be blocking.
scli --start_sds_network_test
Retrieve the results with.
scli --query_sds_network_test --sds_ip 10.241.xxx.xxx SDS with IP 10.241.xxx.xxx (port 7072) returned information on 4 SDSs SDS 3711505b00000000 10.xxx.xxx.xxx bandwidth 183.8 MB (188254 KB) per-second SDS 3711505c00000001 10.xxx.xxx.xxx bandwidth 182.9 MB (187245 KB) per-second SDS 3711505d00000002 10.xxx.xxx.xxx bandwidth 182.2 MB (186579 KB) per-second SDS 3711776d00000004 10.xxx.xxx.xxx bandwidth 174.7 MB (178937 KB) per-second
To verify if this is an issue, run a telnet session to port 7072 from each SDC. If you see a rejection, verify the network connectivity. This is the correct output:
[root@scaleio-1 ~]# telnet 10.xxx.xxx.xxx 7072 Trying 10.xxx.xxx.xxx... Connected to 10.xxx.xxx.xxx. Escape character is '^]'.
Here is an example of a failed connect:
[root@scaleio-1 ~]# telnet 10.xxx.xxx.xxx 7072 Trying 10.xxx.xxx.xxx... telnet: connect to address 10.xxx.xxx.xxx: No route to host