Hi there. Its difficult to say about other file system types because this is the only we have to test upon. We have moved over 100% to ZFS because it suits us. Other options, we ran the snap with a single ZFS plex, which gave us RAID5 resilience on the hardware with hot spares etc. We would lose the HA safety net should that room lose power, get destroyed etc. Using a non mirrored system the problem doesn't occur.
What we have noticed, since the initial blog, was that the offline process on a 400GB database, that has been rolled back, takes 30 minutes to become consistent. This is far better than a full detach and re-attach which takes 4 hours or so. We noticed that the initial ZFS projection was 10 hours to online but this quickly corrects its estimate and flys through. Offlining and onlining around the session snaps is seconds, so that is no problem.
The one thing we can guarantee is without doing this, you get very different results on rolling back. It may be something EMC wish to test in more detail and I have sent a fuller description to them. What worries me is that this could have left us with a corrupted system if we hadn't tested it. ZFS is very clever but this proves one step too far for it.
I suggest you try some simple benchmarked roll backs without offlining before the snap and you'll see the issue. We are on the latest Solaris 10 patch state, so its not an older issue raising its head.
SKT2
2 Intern
•
1.3K Posts
0
August 18th, 2011 07:00
Thanks for sharing this. You metnioned this is applicable for "Host based ZFS mirrors". Are there other options?
solvme
2 Posts
0
August 18th, 2011 08:00
Hi there. Its difficult to say about other file system types because this is the only we have to test upon. We have moved over 100% to ZFS because it suits us. Other options, we ran the snap with a single ZFS plex, which gave us RAID5 resilience on the hardware with hot spares etc. We would lose the HA safety net should that room lose power, get destroyed etc. Using a non mirrored system the problem doesn't occur.
What we have noticed, since the initial blog, was that the offline process on a 400GB database, that has been rolled back, takes 30 minutes to become consistent. This is far better than a full detach and re-attach which takes 4 hours or so. We noticed that the initial ZFS projection was 10 hours to online but this quickly corrects its estimate and flys through. Offlining and onlining around the session snaps is seconds, so that is no problem.
The one thing we can guarantee is without doing this, you get very different results on rolling back. It may be something EMC wish to test in more detail and I have sent a fuller description to them. What worries me is that this could have left us with a corrupted system if we hadn't tested it. ZFS is very clever but this proves one step too far for it.
I suggest you try some simple benchmarked roll backs without offlining before the snap and you'll see the issue. We are on the latest Solaris 10 patch state, so its not an older issue raising its head.
Hope this helps.