Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

52863

September 19th, 2011 01:00

Metro (Stretch) Cluster with PS6010

Hi

I have taken over the management of the VMWare ESX 4.5i cluster the boss would like to make the cluster available if one of the data centres becomes unavailable.

So I have set up something similar but on evas with CA.

So i am looking for HA not disaster recovery features. What I would like to have is   3 node + PS at DC1 and 1 (or 2 nodes) + PS at DC2.

No the systems that are running on this are 3 tier web / app / DB.  basically J2EE stuff.

What I would like :)

active / active replication between the 2 PS devices, active/active being that I can write to a lun that is being replicated on both devices at the same time.  This would make life easy - make sure I have enough bandwidth to handle replication, latency isn't going to be too bad, as the DC's are within 30km of each other.  This way I can put all the nodes in 1 esx cluster and use VMware's drs/ha features to restart machine etc etc...  

For the web doesn't really matter ... just reverse proxy's mainly. Apps - websphere clusters (2 nodes in dc1 and 1 node in dc2). The DB is going to be interesting (MS SQL), have to investigate MS cluster or VMWares drs...  networking BGP already in place.

Next best is the same as above, but with out active/active lun replication .... If the 2 PS can be active/passive - so in sync, but only 1 master, such that node 1 at dc2 talks to the lun via the WAN link thats sort of okay.  But the system must be able to handle fail over, ie the PS on DC1 fails, the PS at DC2 must take over the role and provide all the LUNs

This is also workable.

But from my reading of the PS docu, it has replication, but it snaps and then replicates the snap - this doesn't work for me, I could loose committed transactions - THATS A NO NO.  

I could use MS SQL replication, but ........

So forums guru's can I do what i want with the PS range or do I have to move up to emc/eva/IMB/hitachi ....

Alex

5 Practitioner

 • 

274.2K Posts

September 19th, 2011 09:00

Currently EQL does not do synchronous replication.  Smallest interval is 5 minutes.

A hybrid approach would likely be the best.   ESX has SRM, Site Recovery Manger.  Using EQL replication all the VMFS volumes and non-SQL volumes would use standard replication.  If you had to enable the remote DR site,  SRM would allow you to get that up and running quickly.     SRM knows how to promote the replicas, scan in the VMFS volumes, re-register the VMs and start things back up.  For the volumes that are most time sensitive using a host based sync rep program would be better.   This would save you a lot of time and money to implement.

It's what many of our customers do.

Regards,

-don

72 Posts

September 19th, 2011 05:00

Dell

76 Posts

September 19th, 2011 18:00

Any pointers to what sort of host based sync ie for MS SQL, or are you talking about  MS SQL replication ?

5 Practitioner

 • 

274.2K Posts

September 20th, 2011 04:00

Not really, no.   That's not my area of expertise.   What I see customers using is products like Double Take.  Though I haven't worked with it myself.

Sorry.

-don

76 Posts

September 20th, 2011 05:00

okay thanks I think I am going to look at what the cost of a hardware solution is

76 Posts

December 13th, 2012 14:00

Update old question, but the new release of PS have sync replication.  Currently they specify

< 5ms

campus lan ....

Going to give that a go

5 Practitioner

 • 

274.2K Posts

December 13th, 2012 17:00

It works very well as long as bandwidth and latency aren't restricted.  SYNC Replication means that the local host write isn't ACK'd until it makes it to the SYNC alternate pool.   So more latency equals reduced write performance.   Plus should you have to switch to that alternate ALL the local I/O traffic will be going through your link.   So plan accordingly.  A 100Mb LAN isn't going to work very well.   ;-)  

76 Posts

December 13th, 2012 18:00

yep , yep and yep. Currently planning on 2 x 1G links.

its only  a 14km fibre run between sites so latency is going to be low.

My only concern was how long does the primary equal logic take till it times out waiting for the ack from the remote site.

so

MS SQL  sitting on a sync repl lun... transaction gets committed... fibre link goes down... ak isn't coming from the remote site.

The SQL server is waiting on the write ack, who long does the primary site equal logic wait ?

No Events found!

Top