Start a Conversation

Unsolved

This post is more than 5 years old

98971

March 21st, 2011 06:00

Smart Copy vs. Snapshots? Help with config.

So we have just setup an EqualLogic PS6000 and we are running all VMware ESXi hosts connected to it. I installed and configured the EqualLogic HIT kit for VMware and have started playing around with it. My question is, is it better to use the "smart copy" / auto snapshot manager via this tool or do I use the snapshots that you configure in the group manager application? Are there any differences - advantages/disadvantages?

Thanks!

7 Technologist

 • 

729 Posts

March 21st, 2011 07:00

Inflxx-ec,

Hi, I'm Joe with Dell EqualLogic.

ASM/VE and the PS Series Group Manager work together to provide a comprehensive snapshot solution for your environment. For crash-consistent hardware snapshots, you can use the PS Series Group Manager. For file-system consistent copies of VM datastores (including copies that include a virtual machine memory dump), you would use ASM/VE.

Regards,

Joe

203 Posts

March 31st, 2011 21:00

I posted a series on replication using the built in tools that come with your EqualLogic array (ASM/VE, ASM/ME, and SANHQ).  Even though the series of posts are focused on replicating between two groups, I think you will find some valuable information on 1.) why you should be using these tools, and 2.) a few tips to make them work for you.  Here is Part 2 of the 5 part series.

 

203 Posts

April 5th, 2011 08:00

Anything with regards to reserves for snapshots or replicas is ALWAYS set at the group manager on the SAN. 

 

A “smartcopy” in the EqualLogic sense is a native volume/LUN based snapshot that has some intelligence behind it (quiesced, VM independent, etc.).  So yes, they take away from the reserves you set. 

 

The HIT and ASM/ME will only work on guest attached volumes.  ASM/VE will only work on VMFS volumes and VM’s that vcenter can see, as it rides on top of vcenter’s API.

 

When you set a snapshot reserve, that is the amount that the snapshots will consume.  The default setting is at 100%, so if you create a 100GB volume, with a 100% snapshot reserve size, you will see that it will occupy 200GB of pool space.  Replica reserves are similar.  When you set up replication, it will set a “Local replication reserve” amount.  This guarantees the ability for the SAN to make a special snapshot for the purposes of replicating (it then removes it afterward).  Now, sometimes if you set these reserves on all 3 by their defaults, it can suck up space pretty quickly.  To counter this, you can do a couple of things:

 

For snapshots, trim up the snapshot reserve from 100% to something that might be more fitting to the change rate of the data.  Something like 60% perhaps.  Again, this will vary depending on the type of data.  Keep track of how many snaps can be made before they fall off.

 

For replicas, assuming you have free space, set your “Local Replication reserve to the minimum of 5%, then tick the checkbox for “Allow temporary use of free pool space.”  This means that when its making a replica, it will actually use the free pool space to create that temporary/hidden snapshot.  It will consume that free pool space until it has time to replicate it.  Once it’s finished, it will remove it.

And finally, if you are using ASM/VE, make sure that the groups of machines being snapped are all in the same VMFS volume.  Otherwise, it's going to make snaps on other VMFS volumes, and its going to confuse the daylights out of you.  My blog on replication (link in my earlier post) has more info about this.

57 Posts

April 5th, 2011 08:00

Does using the Smart Copies take away from the reserve pool for snapshots that is created on the volumes? Is it recomended to use both from the group manage and this vmware hit?

I am still a bit confused if I should be configuring these in both places. RIght now they are just on our group manager and i want to make sure if a VM fails or we need to revert back to a certain time that this is done in the most efficient manner.

Thanks.

57 Posts

April 5th, 2011 09:00

Thanks for the info. I guess the only thing I really am questioning is do I need both "Smart Copys" via the ASM/VE and Snapshots via the Group Manager or can I get away with using one. If I can get away using just one which one is the preferred / best practice type choice?

Thanks alot for the input!

203 Posts

April 5th, 2011 09:00

Thats a great question!  ...If you make a snapshot of a volume in the group manager, it has no intelligence behind it.  It makes a crash consistent snapshot of the data on that volume.  It has no idea what is going on in I/O, memory, etc.  Snaps via the group manager are typically only good for volumes that you don't have any VM's turned on, or data being accessed.  They are completely fine for that scenario.

Smartcopies via ASM/VE should be considered critical to protecting the state of your VM's.  They make a VM consistent snapshot, then do a little trickery to turn that into a LUN based snap.  The benefit of this is that you never have the accumulating/journaled type of snaps on your VM's.

Smartcopies via ASM/ME are absolutely critical for servers with guest attached volumes (think Exchange, SQL, etc.).  They make an application consistent snapshot of those data volumes, by leveraging VSS inside of each VM.  They are the best thing since sliced bread once you realize what exactly they do.

So you will have a mix of tools to address specific issues.  Remember, your goal is to have protection that you can recover from.  Leveraging both ASMs give you the ability to do so.  It will be up to you to determine how best to protect your environment.

Please re-read the entire 5 part Replication series.  It goes into detail over why these are needed, and how they work.

57 Posts

April 6th, 2011 12:00

When using the Smartcopy via the HIT in VMware do i want to be selecting "Perform virtual machine memory dump" What benefit does that have by selecting it and what impact would it have by chosing it? It is not selected by default and I was not going to select it, but I am not sure what benefit/drawbacks would be by using it.

Thanks.

203 Posts

April 6th, 2011 12:00

For smart copies you will want it to perform a memory dump.  This will coordinate with vcenter and the vmtools installed on that particular VM to make a VM consistent snapshot

You will not want to tick that checkbox if you were doing an ASM/VE smartcopy replica.  Only do it on an ASM/VE smartcopy snapshot.

5 Practitioner

 • 

274.2K Posts

April 6th, 2011 16:00

Hello, 

 

Re: Memory dump isn't a good standard option to use IMO.   It adds significant time to the process since potentially several GB of memory needs to be dumped out.  On SQL / Exchange Servers I've seen that cause timeout errors.  Since the VM is paused during the memory dump process.  Especially if the VCS is also a VM. It doesn't increase the consistency of the data in the snapshot, since pending writes are already flushed to disk when the ESX snapshot is created. 

Usually a snapshot is used to recover a file, not return the VM to the exact state it was in.  I.e. if you have a snapshot that's 2 or 3 days old, what was in memory at that time isn't likely to be of any benefit now.   it will also increase the size of the snapshot since the additional writes are being done on the same volume by default. 

 

Regards, 

 Don

 

203 Posts

April 6th, 2011 17:00

You bring up a great point Don.  Enabling the memory dump on a smartcopy snapshot does increase the time it takes to process the snap. It will be noticed on machines of all types, anything from a DC to an exchange server, and as you stated, may cause timeouts.  It is why I generally do all of my ASM/VE snaps during the evening where it isn't as noticable.  However, there are a few points worth observing:

1.  All of my Exchange, SQL, and flatfile servers use guest initiated drives and ASM/ME.  Thanks to that, and VSS, those snaps are nearly bullet proof, with no percieved delay from the end user.

2.  Because of the above, I'm absolutely interested in a point in time recovery of the exact state of the VM, even if it induced a bit of a delay during the creation process.  We know our protection is only as good as our ability to recover.  This applies to not only the data we have, but the systems that serve up that data.  When its dealing with the system that serves up the data, it's all about point in time recovery.

 I've asked for additional documentation from Dell in the past on this particular matter.  Understanding every detail about a setting such as this allows the users/customers to configure the setting in the most appropriate way.  I never recieved anything further from Dell/EqualLogic on the matter.  I'd rather not do experimentation on my production servers to see if a snap without a memory dump is good enough.  :-)  Dell/EqualLogic would have the facilities to test, and the understanding to explain what types of actions it may be leveraging from the VMware SDK, etc so that all of us could make the best decision on when and where it is most appropriate.

Great points though Don.  ...A good discussion indeed.

- Pete

5 Practitioner

 • 

274.2K Posts

April 6th, 2011 20:00

Hello, 

 Before an EQL snapshot is taken,  the app calls to Virtual Center and requests a VMware snapshot is done.  With or without the memory dump option.   So memory dump is analogous to a hibernate function on a laptop.  Again I wonder why you would need the memory contents of the VM C: drive days or weeks later?   Your data is going to be on the Data/Log volumes snap'd via ASM/ME not ASM/VE snapshot of the C: drive.   Doing a memory dump won't make your C: drive snapshot any better.  Just the possibly of starting up a VM where it left off.  

For SQL or Exchange, using  ASM/ME is a better choice for sure. 

Re: Testing.  I'm not sure I fully understand what documentation you are looking for.  The feature comes from VMware not Dell.   Dell/Equallogic leverages a feature they provide via the SDK.   Details on that feature would have to come from VMware.     

Regards, 

 Don 

 

203 Posts

April 6th, 2011 22:00

Thanks for the response Don.  I would agree that I’m not interested in memory contents of an ASM/VE Smartcopy snapshot of a system that may be several days old.  Pretty useless in fact.  What I am interested in is the ability to recover the system from the last known, good state.  Without the memory dump, it is crash consistent.  Not that it is an exact comparison to its physical counterpart, but give me an option of a graceful shutdown or hibernation on a physical system, versus pulling the plug, and I’ll choose the former.  It’s a good data point that other users use smartcopies without a memory dump with success.  That is good to hear.

 

I wanted to clarify that I’m not suggesting that a memory dump is required (after all, I rely on replica’s which do not use memory dumps), but I’m going with what is going to provide the best odds for success.  If there is information that suggest going with no memory dumps gives better odds for a recoverable system, then great.  I haven’t found that info yet.

 

I understand that ASM/VE leverages VMWare’s SDK.  But there are some functions that are exclusive to ASM that are worth documenting. 

 

·         Memory dump is indeed a VMWare function, but if you make a snapshot in ESX, even under moderate load, you will see almost no hesitation in the snap.  With an ASM/VE smart copy, you will.  Obviously, there is some locking going on here to extract that journaled snap into a volume snapshot, as well as some other inefficiencies, etc.  Maybe I overlooked that documentation somewhere, but so did everyone else, because I get asked that question a lot.

·         The ASM/VE option of “Include PS Series volumes accessed by guest iSCSI initiator” is a setting exclusive to ASM/VE.  The hypervisor has no idea about that guest attached volume.  There are a lot of implications here regarding how this is coordinated, what it offers and what it doesn’t offer, etc.

·         Ambiguity on the “Save information about all other virtual machines on included datastores.”  It would be interesting to have a typical customer repeat back to EqualLogic what this toggle does. 

·         ASM/VE is great, but the documentation makes no mention of how the snapshots of VM’s in different datastores materialize.  It’s really easy for an ASM/VE user to make an ASM/VE smartcopy of a couple of VMs, and they see a volume snaps that don’t necessarily correlate to what the user thinks they should see.  Only when a user aligns their folders of VM’s in vcenter to match up with the contents in each datastore, do the entire smartcopy process make sense.  This is worth documenting. 

 

As nothing more than an “In the trenches” EqualLogic Customer, I’ve had the opportunity to not only write about these tools, but speak at various events about them.  I am such an advocate of their abilities.  I want to share with potential, new, and existing EqualLogic customers on how they can provide real solutions to real problems at no additional cost, and why they should be using them.  But I get more questions about what in the world these tools are, and how they should be used.  I learned what to do after observing its behavior, and a lot of trial and error (not ideal).  What I learned, and found to be most successful in a production environment wasn’t really addressed in any documentation I found.

 

On a tangent note, I would also contend that ASM/ME (assuming a guest attached volume) is also better for high or sensitive file I/O on flat file storage.  We run our source code control server which is file based (Subversion), where there are thousands of tiny files going in and out at any given time.  Early on in my deployment (before I knew better), I used to let ASM/VE capture those guest attached volumes for me.  However, as you might know (but many may not), when you tick that checkbox in ASM/VE, it does NOT coordinate I/O using VSS inside that guest VM.  We ended up having total file corruption, and was only able to save it with some other trickery.  After I made the change to ASM/VE where I was protecting only the VM’s C:\ drive, then letting ASM/ME do all of the big work for protecting the data volume, it has been rock solid, and extremely robust (even after a significant amount of load testing to make sure it was going to work).  Personally, I think that option to let ASM/VE make a snap of the guest attached VM should be completely removed from ASM/VE.  Leave that purpose for ASM/ME.

 

Cheers to a constructive conversation.  Thank you.

5 Practitioner

 • 

274.2K Posts

April 7th, 2011 06:00

Re: Crash consistent.  Even without memory dump it's more than crash consistent.  Since the VM is paused, all pending IOs are written to disk before a VMware snapshot is created.  The memory dump means you'd have access to the state of the processor and other things in memory besides IO.   Since the snapshot is about storage that's already taken care of.  The memory dump is just excess baggage to me.  That's why I don't find it interesting or needed.  The C: is already consistent. 

The actual ESX snapshot creation time is a function of ESX.   The only reason I can think of why it would be different is that multiple snapshots taken at the same time are optimized in the ESX SDK.   Since that's all that's happening.  ASM/VE asks VCS to snap the selected VMs.  When you do it via the GUI, you're doing it one at a time.  

Re: guest iSCSI initiator. I agree with you there.  I do know that you don't get the full integration features that you would if you used ASM/ME on those volumes.  But via the VMware tools VSS should still be called and pending IO to those volumes should be flushed.  So you will get a more consistent snapshot compared to one done via the EQL GUI itself, but potentially less than one done by ASM/ME.   So for "lighter" applications that maybe sufficient.  I've never had a case where that caused corruption.  Even if it didn't call VSS,  then the only data that would not make it to the snapshot was in the pending cache of the VM.  Which would be no different a situation than if your VM BSOD'd or the ESX server itself failed.  In those cases too you lose pending IO. 

 IMHO,and only IMO, if you have setup storage direct, use ASM/ME to get the most benefit.   But it would allow someone to have fewer schedules for say a fileserver or some application that wasn't integrated into ASM/ME at this time.  

Have you tried ASM/VE v3.0 yet?   If not please do.  I think some of your questions about how snapshots are visualized will be better.  One problem you have with such alignment issues is the nature of the how flexible ESX/VCS can be configured.  Folders, datastores, clusters, etc....  It's hard to align another GUI on top of that since you can't predict what it will look like initially or later, since you can move and change things at will.   ASM/VE now has a plugin inside VCS to make easier to use. 

I agree 100% that ASM/ME is better for high IO or sensitive data.   Especially with Exchange and SQL.  Plus more apps to follow in future.  So I usually suggest a combination approach as you mentioned. 

-don 

 

 

 

 

 

203 Posts

April 7th, 2011 08:00

Good info for the readers Don.  Thanks.

No, I haven't had a chance to try the V3/EPA version of ASM/VE.  Looking forward to it though!  The demo I saw of it really showed some refinement in the product.  Great news for all of us!

1 Message

April 15th, 2011 21:00

Great thread guys.  I was wondering what best practice would be for my situation.  I am currently simply using the SAN to SAN replication feature to replicate my VMFS Volumes between production Home Office and our co-location site.  In my testing it has been ok bringing up the machines in a crash consistent state.  However, I now am looking to move to ASM replicas now to better ensure recovery integrity.  From reading what you have written, it seems like I should use ASM/VM to replicate my VMs C drives and typical server configs.  And I should use ASM/ME to replicate my Exchange data and SQL data.  Currently all my VMs are set up with persistent drives within the VMs.  How would i set up my VMs to enable ASM/ME to backup SQL Exchange, and maybe my heavy duty file serving data drives?  Do I need to migrate the data to raw drives set up to their own LUNs on the SAN?

Any ideas and help you may have to offer would be appreciated.

 

No Events found!

Top