Unsolved

This post is more than 5 years old

1 Rookie

 • 

34 Posts

2597

May 6th, 2016 14:00

CloudArray reports that a cloud provider (ECS) is offline when it is really timing out.

We are trying to stabilize our Cloud Array;  when we upload data to archive, the transfer goes fine, but about 15 minutes after it completes the Cloud Provider circle goes red stating that a cloud provider is offline.  We have checked the network since our cloud provider is private ECS and that is stable.  We worked with our resident and he found that it is timing out and thus reporting that the cloud provider is offline.

When it reports that the cloud provider is offline, it then takes 2 hours before it goes back online.  Are there settings that can be tweaked to allow for more time so that it does not timeout, and then one that does not require 2 hours before checking to see that the cloud provider is online?  We need this to stay green all the time.  Any help that can be provided would be great!

Thanks,

Mike

Our CloudArray is at 7.0.0.0.8273

After posting this, I change a few settings:  Bandwidth throttling set to 125Mbit/s from 7AM to 7PM and cloud performance optimizer set to Between 51 and 100 Mbit/s.  We will try again and monitor more closely to see if it recurs, but still wondering if anyone has seen this kind of behavior before.

Message was edited by: PastorMike

2 Posts

May 9th, 2016 11:00

Hi PastorMike. This is a production unit or POC?  Virtual or Physical CloudArray?

I would open up an SR and attach a log file immediately for reference.

Could you be more specific on this "We worked with our resident and he found that it is timing out"

What exactly is timing out?  The network connection? However  you say that the network connection is fine.

If the network connection is fine and you are getting timeouts, are you thinking there is a NIC problem on the CloudArray ?

If the CloudArray encounters any sort of timeout to a ECS node the loadbalancer should direct to a healthy node.

If we are not redirected, we assume Cloud DOWN.

We will continue writes until the cache fills, continuing the cache flush if the link comes back up.

For reads we will stay up as long as the read is from cache. If we attempt a read from Cloud and the Cloud can not be reached- we will take the volume/share offline.

Rick

2 Posts

May 9th, 2016 14:00

Took a look at the SR. Looks like DNS issues, external routing issues (is there a Proxy?)

And possibly (believe it or not) timezone/time difference between the CloudArray and ECS.

Is DNS wild carding enabled?  Meaning -  anything (*) dot the ECS namespace has to resolve to an IP.

Also what kind of load balancer is in place?

I cant seem to add to the SR.

1 Rookie

 • 

34 Posts

May 9th, 2016 14:00

Hi Rick,

Matt and I discussed this.  Neither of us know the setup of the Load Balancer and we suspect that possibly is it not re-directing to a healthy node, or to a 'lesser busy' node, so one node becomes busy and the requests are timing out which would cause it to say Cloud Down, correct?  I will update here later when I have learned more about the LB.

Thanks,

Mike

1 Rookie

 • 

34 Posts

May 11th, 2016 07:00

We also found a DC that is not in service in the CIFS config so we removed that and are monitoring.

The Load Balancer is configured for round robin only -- no other algorithm employed, and the connections and throughput are very well balanced.

0 events found

No Events found!

Top