ECS: CAS connection string and SDK read failover differences with Centera
Summary: Centera and ECS work differently while responding to initial probe after pool open for the Software Development Kit (SDK).
Symptoms
When connecting to an ECS using the Content Addressable Storage (CAS) protocol with JCASScript, when running the info command, the Replica address is empty.
How does the SDK failover during reads if the primary ECS is not available?
Centera and ECS work differently while responding to initial probe after the SDK pool is open.
Cause
Resolution
Centera:
If supplying the primary Centera IPs in the connection string as part of the initial probe and after the pool opens, Centera sends back the replica IP addresses in the probe response to the SDK. The SDK uses these replica IPs for operational failover (reads, writes, deletes, exist) upon primary or connection failover (Centera stops or network to primary stops).
If the SDK option lazy_pool_open is used, then the SDK does not probe to secondary addresses. Secondary addresses are probed if there is an operational or network failover.
ECS:
If specifying only the primary IP address in the application connection string as part of the initial probe response after the pool opens, ECS does not send back replica IP addresses in the probe response. The SDK does not know about the secondary IP addresses. On ECS, a bucket is global, and is designed to provide strong consistency. Where writing objects, ECS fetches the object irrespective of replication status. This provides operational failover (read, write, exist, and delete) from any Virtual Data Center (VDC).
Having primary and secondary addresses in the connection string is recommended for connection failover.
The SDK first probes the first IP in the connection string. When it receives all the primary VDC IPs, as part of the probe the SDK does not probe other IPs in the connection string (as with lazy_pool). It uses other IPs in the connection string for connection failover.
Normal pools open (not using lazy_pool open - which Engineering recommends) first probe the first IP in the connection string. Once it receives the response, it logically separates the primary address and probes only the next secondary IP in the connection and keeps all the secondary IP addresses in cache. If the primary VDC cannot be reached, if Access During Outage (ADO) (15-minute timeout) is enabled, it then tries all the primary IPs (same as Centera). After all the IPs throw network errors it tries the secondary IP. Once the 15-minute ADO timeout occurs, the secondary VDC gives access to read, write, delete, and exist operations.
If not using the secondary IPs in the connection string, and if the primary VDC fails or loses network connectivity. The application connection string must be manually updated to include the secondary VDC IPs to access the secondary VDC. The ADO timeout of 15 minutes must elapse before operations work.