Start a Conversation

Unsolved

This post is more than 5 years old

39730

May 14th, 2013 06:00

Equallogic volume distribution

Hi all,

I have an Equallogic group with two PS6100's and two PS4100's working together. I was always under the impression that volumes would be capacity load balanced across 3 members maximum based on official documentation (note it says 'under normal circumstances') however on this particular installation I am seeing several volumes that are spread across 4 members which is a surprise to me. Can anyone tell me why I am suddenly seeing it load balanced across 4 members?

Note: I just recently joined the two 4100's in to the 6100's group and one of the 4100's is still performing its RAID verify.

Thank you!

5 Practitioner

 • 

274.2K Posts

May 14th, 2013 08:00

It's likely rebalancing the space at the moment.  When you have more than three members it will pick and chose different combinations of the three members for different volumes.

You can either wait until things settle out and see if you still have the issue or open a support case and they can review the diags.  

What kind of switches are you using?  Make sure you have sufficient inter switch connectivity.  Adding two members is going to add load to the ISL.  As large amount of data is moved between the members.  Problems at that level, and delay this movement and result in what you are seeing.

18 Posts

May 14th, 2013 08:00

I agree its probably re-balancing. Will need to see how this changes over the next few days.

I checked the ISL which is 2x1Gbps in a port channel and we're only seeing approx 100Mbps between the two storage switches so I have no concerns there.

5 Practitioner

 • 

274.2K Posts

May 14th, 2013 09:00

That's not enough of an ISL.  It's not all about MB/sec.  I've seen this often over the years.  You need to greatly increase that ISL.  You are trying to drive at minimum 6x ports just from the arrays over two ports.  Then add in all the servers which will also use that ISL, and you have seriously over subscribled that ISL.  Minimum would be 6->8x GbE ports in that ISL.  Or multiple 10GbE ports if available.

What kind of switches are they?

5 Practitioner

 • 

274.2K Posts

May 14th, 2013 11:00

Re: ISL.  That's not true.  The cables from the arrays and servers should be distributed across both switches for proper HA.   The array will use any available port to reach an initiator or another port on a different member.  There's no awareness of what switch it's on.  So a server nic connected to Switch A must use the ISL, to reach an array port on Switch B.  

When you are using MEM or HIT, it will try to connect directly from every server NIC to every member that has data for that volume.  NLB is not used in that case, since MEM or HIT manages the connections.

Neither NLB or MEM/HIT have awareness of retransmits or pause frames.

I've seen this scenario many times in support.  After adding arrays to a group w/o sufficient ISL, performance goes down.  It can also prevent the space balancer and advanced performance load balancer from working properly.   Increasing the ISL resolves the problem. In larger configurations replacing the switches with stacked switches does.

Regards,

18 Posts

May 14th, 2013 11:00

There are two Cisco 3560E's dedicated to servicing 2xPS6100's and 2xPS4100's in a full mesh topology, with all connected hosts uplinked to BOTH switches. There is no case where the only path between a host and the storage REQUIRES the ISL (unless there was a NIC failure on a host). I was under the impression that if the ISL ever became oversubscribed and latency increased due to TCP retransmits or pause frames that MEM or Dell's DSM would be able to NLB the sessions accordingly.

It's a difficult statement to back up if there are zero indications of an oversubscribed ISL. Packets per second, megabits per second, interface statistics, buffers, etc, are all well within normal parameters.

18 Posts

May 14th, 2013 12:00

I messed up the diagram a little bit when it comes to the active/inactive ports on the PS6100's but you get the idea. 4 active and 4 inactive ports to each switch.

18 Posts

May 14th, 2013 12:00

We are distributed correctly for HA.

I drew up a quick diagram:

http://i.imgur.com/HmyAtuJ.png

5 Practitioner

 • 

274.2K Posts

May 14th, 2013 12:00

I do understand, thank you for the diagram.  Itconfirms the importance of the ISL not being a bottleneck.

5 Practitioner

 • 

274.2K Posts

May 14th, 2013 13:00

iSCSI is very bursty so the SNMP monitoring can be misleading.  You commonly see the issue during a spike in the IO load.  You might just not get as much performance as you could be.  It "works" but not optimally.

Regards,

18 Posts

May 14th, 2013 13:00

I'm glad that clarified that. You had me a bit worried even though I was sure it was sufficient (for now). :).

I'll keep an eye on the volume load balancing.

No Events found!

Top