Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

2795

November 20th, 2012 06:00

Group IP election process in an event of a network outage

In the event of a network outage which causes the group IP address to be unavailable, how does the group IP re-election process work? How does the other arrays work out if they are online and available to take the group IP alias? How long could this process take? We are seeing our XenServers rebooting which happens after 144 seconds of the iscsi target being offline - this is expected behaviour and we need to understand if this is a component in the network causing issues or the the group IP election process taking more than 144 seconds. Would having more arrays in the pool, more connections, a large amount of volumes have any impact on the group IP relection?

Thanks in advance.

Oli

7 Technologist

 • 

729 Posts

November 20th, 2012 08:00

The group IP (Well Known Address or WKA), is assigned to one of the member and that member is then known as the “group lead”.  This is a proprietary election process, where each member maintains communications thru a mesh connection and establishes a keep-a-live value.  Once one of the members no longer responds (the keep-a-live value is exceeded) and the election process is initiated.

In normal operations, the group lead is fairly regular, that is to say doesn’t change from member to member, and only changes in certain situations.  Examples are if you have a controller hardware failure on the group lead (that initiates a controller failover); updating firmware during the restart phase of the group lead array; vacating the group lead member from the group, etc.

In the event of a network outage, the typical timeout is several seconds (10-60).

The group IP would only factor for iSCSI host connections, i.e., the host initiator requests a arp reconnect.  So the group IP being down for only a few seconds should be fine, provided that the iSCSI keep-a-lives are configured per the recommendations on the support site for each host.

If a volume has slices on more than one member (spans more than one member), and one of the members is unreachable, then the volume is placed offline until that member is brought back online.

-joe

11 Posts

November 22nd, 2012 02:00

Great - thank you

No Events found!

Top