Start a Conversation

Unsolved

13 Posts

5344

June 25th, 2020 01:00

How to restore (revert) data from a snapshot on SC4020

Is there a documented procedure for restoring data from a snapshot for the Compellent SC4020 as I cannot see any options on the snapshot or volume.

One possible way would be to:

  1. Create view on snapshot
  2. Copy view to volume
  3. Unmap and delete original volume 
  4. Map new volume to server

This is very convoluted and means the server now sees a new set of LUN ids - surely there is an easier way to do this - all other snapshot technologist I have used simply have a "restore data from snapshot option" 

Thanks

Mike

2.9K Posts

June 25th, 2020 07:00

Good morning,

 

https://dell.to/382j5dK

 

Linked above is the Admin guide for Dell Storage Manager, which should have the functionality you're looking for. Page 116 goes into detail on making recovery volumes from snapshots. 

 

The important thing to note is you'll want to unmap the Recovery (View) Volume once done using it, or the snapshot it was taken from won't expire.
This method provides a way to restore data in a side-by-side manner where you won't have to do anything with the original LUN ID. Let me know if that helps clear things up, or if you have other concerns.

July 1st, 2020 02:00

Thanks for your reply.

There are 2 reasons to take a snapshot:

  1. For recovery, so you can roll back to this snapshot
  2. So that you present the snapshot simultaneously with the original volume (normally on a different server) for purposes such as backups, reporting or refreshing a test system with live data

When you create the snapshot - the process is the same for 1 and 2, and hence the section you refer to has an option "Create Volume from Snapshot" with NO reference to "recovery", which is correct as this could be used for purpose 2 above which is NOT for recovery.  So the option in the GUI does NOT say "Create a Local RECOVERY Volume from Snapshot, so I don't know why the the title of this section does.

What usually differentiates when a snapshot is used for recovery is the way you "use" it, not the way you "take" it, so normally you are able to roll back the volume to the PIT that the snapshot was taken and there is no need to create a view in this process.  But I cannot find the process of using the snapshot for recovery in this guide - therefore I can only assume the process is as outlined in my opening post (but in step 3, you don't have to delete the original volume, you can JUST unmap it - if you have enough space, you can keep it for a bit).

The downside of this approach is it has more steps than simply rolling back the volume and seemingly this process is undocumented which is what you don't want when your data is corrupt and your system is down and you need to roll back to your snapshot. 

In addition to this, the snapshot will have different LUN ids, so on the host side you will have to do extra steps of not just umounting the filesytems, but also deporting volume group and removing devices and re scanning and then the bigger problem is that with different LUN ids, there is no guarantee the O/S devices are created with the same names as before, in fact, if you say have 20 volumes, then unless you took the snapshot of each volume in the order you created the initial volume, the you are guaranteed to have different disk device names on the host and this means you have to map the old LUNid to the new LUNid and update the new disk device name to be able to access the right data - not something you want to be doing when your system is down and you are needing to recover quickly.

So am I correct in my opening process, that this is the process for recovery and is this process documented - i.e a specific procedure of how to recover your data using a snapshot

 

2.9K Posts

July 1st, 2020 10:00

I followed up and spoke to a few Compellent support reps who have said the information I provided is correct. You can certainly try the process you laid out, but it isn't what they recommended. If your way does work, then that'll be good to hear. 

 

If you'd like, you can send me your service tag (or serial number) and I can check the warranty status and find the phone support number for your region.

July 1st, 2020 13:00

I'm sorry, I'm confused, I thought you were suggesting my procedure which is very similar to the procedure you reference:

Let's give an example with a single LUN:

LUN is called ORIG and has serial number 0 and this is mapped to host and the host sees disk0 belonging to VG0 and is mounted as /data

Snapshot LUN of ORIG is called SNAP which when volume is created gets given serial number 1

I umount /data ready to restore data from snapshot

So in the procedure you reference:

Step 5: Create Volume from Snapshot which is also called creating a view on a volume - i.e it is not a full volume - it is a view which relies on BOTH ORIG and SNAP - this is Step 1 in my procedure

Step 8: Map recovery volume to server - this is Step 4 in my procedure

So now on the host, the VG0 on ORIG is still there.  If this host is Linux then you now have second VG called VG0 on SNAP and most systems can't cope with this so you may have just corrupted your data.  If this host is HP-ux (which mine is) then the new SNAP disk does not have a VG name because VG name is stored on the host and the host does not know what serial number 1 is.  The disk device name for SNAP in the host cannot be called disk0 as this already exists, so I now have a new disk1 and I can't create a VG0 as this already exists.  If I create a new VG1, then this will change the block device so then I would have to update fstab to reflect this.

So I think you can see here, that you need to unmap ORIG before mapping SNAP which is what I did in step 3

So if you unmap first, then SNAP may still get assigned a disk device other than disk0 and if you have 20 disks, then these WILL have different device names unless you create the snapshots in the exact same order you created the original volumes and even then there is no guarantee that you will get the same device name (the only guarantee is if the disks have the same serial number)

So now I have a mapping exercise to map the old serial numbers to the new serials number and correlate this with the new disk device names.

So if I do all this, then I can now mount /data, and if I am happy with the restored data then I now want to remove ORIG, but I can’t do this as my view volume relies on this, so I need to copy the view volume into a new volume – this is my step2.

So the procedure you reference is the same as mine except it is missing step 2 and step 3 which need to be done at some point so, unless I am missing something here, the procedure in the manual has to do the 2 extra steps I do at some point and presents “recovery” LUNs with different serial numbers than the original and this causes a lot of work.

In other technologies, you simply roll back ORIG to SNAP and so you are not changing any serial numbers so you don’t have to do any unmapping and mapping of serial numbers and on the host you just mount the data.  So I just want to make sure that this method of keeping the original serial numbers is NOT possible before we test recovery in a couple of weeks, where the current plan is the convoluted procedure I have outlined.

If I am wrong and there is an easier process, then great, please share this.

Just to add this would be no different if the host was Windows, so then if you have 20 LUNs with drive letters to W: then if you present 20 new LUNs with different serial numbers, then your drive letters are not guaranteed to be the same as before

2.9K Posts

July 2nd, 2020 07:00

Good morning Mike,

 

It may be the case that we misunderstood each other (much more likely to be on my end). With my reply, I'd just been trying to say something akin to "the supported process looks like this, per Compellent," not send a message saying "No, you're wrong." If I came off rude, I certainly apologize, that was absolutely not my intention. 

 

The core of what I needed to confirm with Compellent support is that snapshots weren't designed to restore data, but for data progression down the tiers in a Compellent system. 

 

It has been recommended to me to help get you to one of our phone teams, though. I wouldn't recommend posting your service tag (or serial number) in the public forum, but you can send it to me in a private message and I can see what options there may be, or I can get you the number to support and you can go ahead and call into the queue. If there's anything I can do to facilitate, please let me know.

July 22nd, 2020 01:00

Sorry for late reply, I missed your response and this recovery procedure was a fail safe in case a procedure went wrong and wasn't needed so haven't been able to test, but I know it would work as we regularly use snapshots to refresh a test system with live data, so this procedure is very similar.

So if snapshots weren't designed to restore data, then I think we can be pretty sure there is no method to simply roll back ORIG to SNAP and so restoring means you will get a new set of serial numbers which creates a lot of work at the O/S level.

I didn’t take your response to mean I was wrong so no need for an apology, I was just trying to clarify procedure.

So I am clear now on how to recover using snapshots, but can you point me to any documentation that explains how snapshots are used for “data progression down the tiers”.  So on our arrays we have 3 daily snapshots and it is clear that 2 are incremental snapshots, (so only taking up space for the changes for the last 24 hours), but the first snapshot looks like it maybe a full snapshot, so would be good if you have some documentation that explains how there are used for “data progression down the tiers” and how much space these consume.

Thanks

Mike

2.9K Posts

July 22nd, 2020 09:00

Let's backup just a step. If I just missed this in your posts I apologize, but what SCOS version and DSM version are you running? Any documentation I can dig up is almost certainly going to be tied to a particular release, so it would probably be best to get that first.

 

If that's information you'd rather not share in the thread, you're also welcome to send it to me in a direct message. 

July 27th, 2020 02:00

I was not looking for really specific information  - just the URL of documents that give more information about snapshots and data progression on the SC4020  – you referenced https://dell.to/382j5dK above and I was already using this, but the does not give very detailed information –so this has a section:

Data Progression on page 40 and this says

“Storage Center also uses Data Progression to move snapshots”

so this make sense that you may want to move snapshot data to a different tier, but this doesn’t talk about snapshots being used to enable Data Progression, but page 114 says “Use snapshots to create a point-in-time copy (PITC) of one or more volumes. Creating volumesnapshots allows the volume to take full advantage of data progression.” so this is more along the lines on what you said in that snapshots are required in order to use Data Progression.

Now I am not looking for you to explain this – just point me to a document that gives more information about Data Progression and snapshots for the SC4020 so for example, where I have used Data Progression before I have specified a time frame so like move blocks that have not been used for 30 days to a lower tier, but I can’t find this sort of detail in this doc – the doc just specifies WHEN to run Data Progression, not how old the blocks should be before they are moved to a different tier.  So I am just looking for a document describing the functionality of Data Progression and snapshots on the SC4020 and I was hoping such a document exists.

Thanks

Mike

2.9K Posts

July 27th, 2020 15:00

I spent a fair amount of my day trying to locate a doc resource for you, but unfortunately, I've been advised that one doesn't exist. That having been said, I can do the next best thing. One of the Compellent support engineers offered to type something up for me to send to you, if that would help at all.

 

Once he finishes, I can try to post it here, or I can email you, if that's better. I'd just need you to PM me whatever email box you'd want to use, if that's the route you'd want to go.

July 28th, 2020 05:00

Thanks, that would be great - I think best to share the knowledge here so that other people can make use of it - to clarify the sort of things I am looking at are:

Data Progression:

How many days does a block not have to be accessed for before the block is moved to a different tier and are they moved back if they are accessed again – is this configurable and are they any thresholds like you need X amounts of access attempts rather than just one in a time period and is there any distinction made between reads and writes

Snapshots:

How are these used for “Data Progression” and if you want to use snapshots for other purposes then does this interfere with data progression - to understand this, I think we need to understand how the snapshots work so the Storage Manager guide you mentioned on page 126 says:

Storage Center snapshots differ from traditional snapshots/PITCs because blocks of data or pages are frozen and not copied. No user data is moved

So in the example below we have 3 daily snapshots taken at 12am on the  14th, 15th & 16th and a manual snapshot taken on the 16th at 10:43am, so which of these are used for data progression?

CVA daily snapshot (3).png

So I guess here that on the 15th that 5.46GB was written to the volume and this data actually gets written to the snapshot as the volume is frozen (as oppose to COW snapshots which copies data in the volume to the snapshot and then writes to the volume) and on the 16th, 7.32GB was written, but how much was written on the 14th and what does “Space Recovery” mean as this is set to Yes for the 14th and “No” for all the others (I looked for the term “Space Recovery” in the "Storage Manager" guide, but could not find it)

When the snapshots expire then data in the snapshot will need to be written to the volume.

If you use a snapshot for recovery then presumably the snapshot won’t expire it is mapped to a server so does this hinder the data progression in any way (and to expire snapshot you will need to at some point copy snapshot to a volume)

And will manual snapshots affect data progression, if for example I present a manual snapshot to a test system and then after a month, create a second snapshot for the test system and delete the first.

Thanks

Mike

2.9K Posts

July 29th, 2020 06:00

Morning Mike,

 

I got the information from my friend today. The following is a direct paste from him. Please let me know if this helps.

 

Mike,

Sorry I took so long to respond.
 

Data Progression works on a '12 down, 4 up' schedule. This means that if in 12 data progression cycles a block has not been consistently touched (read from) then it progresses down. If in 4 cycles it HAS been consistently touched (read from) then the array moves it up. Since each cycle runs, roughly, once per day that would be no more than 12 days down, 4 days up. This is not configurable and there are now thresholds to modify. If new data is written to that volume, it goes in to 'Active' Snapshot waiting to be progressed. If that data is a modification of already existing data, then that existing data will eventually be expired by the Snapshot Schedule and the block will be returned to the pagepool. Meanwhile the modified block is progressed to its appropriate tier until it's deleted, modified, or never accessed and stays in tier 3.

 

How are snapshots used with Data Progression?

 

Think of them as an empty container for writes. Once the snapshot schedule creates a new Snapshot, then that container is closed and thus begins the '12 down, 4 up' countdown. If you have blocks that are frequently read from in that snapshot, then the array keeps those blocks in tier 1 and everything else goes down. So really all of the snapshots on the SAN are used by Data Progression. However, if you don't use snapshots the array can still progress data, but it won't do it as efficiently.

So can I use them while Data Progression is running?

 

Absolutely, there is no impact since we don't use COW, just pointers. Use them as read-only volumes for backup applications, use them as read-only volumes for a hasty file recovery, the Storage Center can do it and progress data.

 


How much data is in the snapshot and what is Space Recovery?

 

Starting with the last snap on 9/14 - the last snapshot will always be the summary of the volume data or simply the active data your hosts read from. If you delete that snap, that ~500gb just rolls up in to 9/15 snapshot while the older now modified blocks from before are expired. Unfortunately there is no way of knowing how much data was in the 9/14 snap. If you want to delete that data in the snapshot then you'll need to delete the volume or delete un-needed data on the host. That is where space recovery comes in. Space recovery means that the snapshot contains Zero'd blocks from the host where data has been deleted. After that snapshot expires, those blocks are returned as free space.

Will using snapshots for file recovery hinder Data Progression?

 

No. Once you make a View Volume (read-only volume) then the SAN treats the View Volume as a new volume with pointers to the old. If you read from data on a View Volume then you are still counting against the '12 down, 4 up' rule because that View Volume is accessing a block nonetheless.

 

For examaple: you realize you need a hasty file recovery so you take a manual snapshot and convert it to a View Volume. So far all the SAN thinks you have done is create a new volume with pointers to the manual snapshot. Now you mount the View Volume to your host and you start reading from the View Volume then a count to progress the data up has been added, or if it is already in tier 1 then a count is added to keep it in tier1.

 

Okay, let's say you forget to delete that View Volume and the manual snapshot. What is going to happen with Data Progression? Nothing. Even if that snapshot is set to "never expire" data progression still runs. The SAN only cares about what is and what is not being accessed when it comes to data progression. The snapshots are just containers of writes.


Will manual snapshots affect Data Progression?

The SAN will treat manual snapshots the same as system created ones even if it's a month apart. Data progression will run, and run, and if that data stays accessed then that data will stay in tier 1, if it's never accessed then it goes down to tier 3. If you expire that snapshot after taking another one then the metadata has a record of what tier that block was in prior to the expiration and keeps it there, and keeps going.

 

I hope this answers your questions, Mike!

No Events found!

Top