PowerScale: CloudPools NANON Problems
Summary: CloudPools requires each node in the cluster to be able to connect to the CloudPools to function properly. Intermittent issues can be seen when even a single node is unable to reach CloudPools. ...
Symptoms
Symptoms that indicate a potential situation where not all nodes can connect to the cloud provider or cloud bucket:
- If a node is not part of a network pool, errors retrieving files appear. This can be intermittent or seemingly random.
- If a node is part of a cluster that is a SyncIQ (SIQ) target, I/O errors occur that cause jobs to fail. You may see successful jobs or jobs with only a few problematic policies, but that does not prove the configuration works, or that failures have not occurred.
- You may see delays in recalling files.
- You may see random failures in recalling files, and subsequent retrieval attempts succeed.
Cause
CloudPools requires each node to have an external network connection capable of transmitting data between PowerScale and the cloud provider device.
CloudPools uses the network pool with the lowest priority that can connect. If unable to connect, it moves down the list of network pools until it finds one that succeeds or displays an error that it cannot connect.
All nodes participate in CloudPools activity. This is by design to be able to recall or offload data as quickly as possible.
There is no way to change this behavior.
Resolution
While you can attempt a hammer approach, such as restarting the SyncIQ job each time it fails, that is not a long-term solution.
The only solution is to verify that every node can connect with your cloud bucket.