Unsolved
This post is more than 5 years old
1 Rookie
•
46 Posts
0
315
April 14th, 2014 07:00
Gotchas DMX - VMAX Migrations with VMware
OK
I came across a couple of standard design gotchas that seem to catch people out between EMC best pratice and actual system limits thought I would see if anybody else has seen similiar gotchas when doing this or designing solutions
The first two gotchas are two limits that seem unlikely but are actually easily reached espically if you look to do like for like between orginal devices which are smaller than what best pratice in terms of over all datastore size may recomend now or where things are not quite the way you believe as you go forward.
In most cases it is beware doing thing where the term like for like is used as what may work now may not when you actually get down to moving across and trying to make use of all that extra capacity and grunt where the like for like statement is not quite true and proves a reciept to bite you in the arse.
First one is the 1024 path limit on VMware devices for a single VMwase ESX host and EMC port group recommendations
When looking at creating a port group the present tool sets recommend that each port group contain 4 ports unlike a DMX where we would individually mask out a one to one relations ship between host hba and FA this port group ability means we can easily creatre multiple connections very rapidily
In my case we porposed setting up two port groups with four fas in each one for fabric A one for fabirc B, fair enough gives us a large set of bandwidth as each is set to 8 Gb giving a bandwidth of 64 Gb across all 8 paths
But think about this as what it meant in terms of device count .
Each device added would add 8 paths not a problem as such until you look at the fact that this means we could only have 128 devices assigned to each esx host
Therefore consider either using less number of ports per port group and use multiple port groups and manually balance across the required paths or ESX farm so using the bandwidth or use this as a better argumernt about the size of you datastores and what using smaller sizes actually means
The other limit is also inconspicous until you look at trying to use flexability in your SRDF setup
For each RA pair you can only have 64 actual SRDF groups set up (present limit)
But EMC recommendations are normally to spread the load of replication over as many RA pairs as possible for failover, bandwidth and redudency ok if we are talking application level failover etc or servers we may never see this limit reached as not all services are considered critical or replicated using SRDF
The scenerio here is that we had four RA pairs per site all at 1 Gb to maximisae the bandwidth it was preposed that we spread the load acros all four Ra pairs
But then look at the way the scenerio planned on using the SRDF replication in my case the secenerio called for using async in consistant modeevery device we wanted to manipulate as a group had to be in the same SRDF and device group
Buit each host for flexability in testing and failover would have several seperate groups defined for different aspect and application of the hosts in a failover scenerio, rather than have a single group per host or application. This rapidily increased the number of SRDF groups needed
Therfore for each application or server using all four RA pair in this case would have meant we had a simple limit of 64 SRDF groups for the whole estate not really compatible with what we needed in terms of required SRDF groups numbers
So the second gotcha here is look at your SRDF carefully and where possible restrict your replication based on the service need and criticallitally of your system you are replicating and its rate of change over the availble SRDF links
Has anybody similair war stories or lesson learnt where a possible simple design decission taken before the project starts could or did have consequences to the finmal implmentation of there solution

