Start a Conversation

Unsolved

This post is more than 5 years old

3536

May 30th, 2016 09:00

EQL PS6500 High IOWAIT spikes during page moves (re-balance)

We have a five member PS6500 pool using 6248 switches.

The PS6500's are at firmware 7.1.9

The symptoms experienced are high iowait spikes for VM's at times when there are page moves going on whereas datastores loses connecitivty for a second or two.  Note that the active page moves happening (I have pm mon in a window constantly monitoring) are not for the datastore(s) affected at that time.  There is some correlation in the logs in SAN HQ when there are connection reconnects or volume moves noted but not 100% of the time.

So for example when the current pm plan is done everything is extremely stable. As soon as a new plan starts the issues rear their ugly head again.  

We do have a case open with a great pro support engineer but it seems like my choices are to break the pool to three and two members which is not something I want to do for a number of reasons, or ask Dell to simply stop the re-balancing period as it is causing far more harm than good.

Anyone else experience the same issue and if so how did you solve it?

5 Practitioner

 • 

274.2K Posts

June 1st, 2016 10:00

I wanted to clarify a little more about Page Movement and how central it is to the PS Series SAN. 

The blocks of a volume are associated with "Pages" on the EQL array.  Each page is 15MB in size.  When you first write to a page that page becomes allocated.  Depending on how the LBAs of a volume are laid out at the moment.  One member may have more of those LBAs than another member.   So during heavy writes that member is getting more IO than the others.  This is especially true when you have a very large capacity member in the same pool as much smaller member arrays.   If page movement wasn't enabled, then it's possible to fill up one member, slowing it down.

 Additionally, this means that the other members are getting fully utilized.  So the "Capacity Load Balancer" (CLB) periodically checks for that and redistributes the pages to other members to keep free space in check on all the members.  The "Advanced Performance Load Balancer"  (APLB) runs when there's a significant difference in latency between the members and swaps busy pages for idle ones.  

So there's allot going on behind the scenes.  

The only way to not have page balancing is to only have since member pools.   Then create multiple groups. 

Don 

No Events found!

Top