Unsolved

This post is more than 5 years old

1 Rookie

 • 

46 Posts

315

April 14th, 2014 07:00

Gotchas DMX - VMAX Migrations with VMware

OK

I came across a couple of standard design gotchas that seem to catch people out between EMC  best pratice and actual system limits thought I would see if anybody else has seen similiar gotchas when doing this or designing solutions

The first two gotchas are two limits that seem unlikely but are actually easily reached espically if you look to do like for like between orginal devices which are smaller than what best pratice in terms of over all datastore size may recomend now or where things are not quite the way you believe as you go forward.

In most cases it is beware doing thing where the term like for like is used as what may work now may not when you actually get down to moving across and trying to make use of all that extra capacity and grunt where the like for like statement is not quite true  and proves a reciept to bite you in the arse.

First one is the 1024 path limit on VMware devices for a single VMwase ESX host and EMC port group recommendations

When looking at creating a port group the present tool sets recommend that each port group contain 4 ports unlike a DMX where we would individually mask out a one to one relations ship between host hba and FA this port group ability means we can easily creatre multiple connections very rapidily

In my case we porposed setting up two port groups with four fas in each one for fabric A one for fabirc B, fair enough gives us a large set of bandwidth  as each is set to 8 Gb giving a bandwidth of 64 Gb across all 8 paths

But think about this as what it meant in terms of device count .

Each device added would add 8 paths not a problem as such until you look at the fact that this means we could only have 128 devices assigned to each esx host

Therefore consider either using less number of ports per port group and use multiple port groups and manually balance across the required paths or ESX farm so using the bandwidth  or use this as a better argumernt about the size of you datastores and what using smaller sizes actually means

The other limit is also inconspicous until you look at trying to use flexability in your SRDF  setup

For each RA  pair you can only have 64 actual SRDF groups set up (present limit)

But EMC  recommendations are normally to spread the load of replication over as many RA pairs as possible for failover, bandwidth  and redudency ok if we are talking application level failover etc or servers we may never see this limit reached as not all services are considered critical or replicated using SRDF 

The scenerio here is that we had four RA pairs per site all at 1 Gb to maximisae the bandwidth it was preposed that we spread the load acros all four Ra pairs

But then look at the way the scenerio planned on using the SRDF replication in my case the secenerio called for using async in consistant modeevery device we wanted to manipulate as a group had to be in the same SRDF and device group

Buit each host for flexability in testing and failover would  have several seperate groups defined for different aspect and application of the hosts in a failover scenerio, rather than have a single group per host or application. This rapidily increased the number of SRDF groups needed

Therfore for each application or server using all four RA pair in this case would have meant we had a simple limit of 64 SRDF groups for the whole estate not really compatible with what we needed in terms of required SRDF groups numbers

So the second gotcha here is look at your SRDF carefully and where possible restrict your replication based on the service need and criticallitally of your system you are replicating and its rate of change over the availble SRDF links

Has anybody similair war stories or lesson learnt where a possible simple design decission taken before the project starts could or did have consequences to the finmal implmentation of there solution


No Responses!
No Events found!

Top