Re: Experience / Best Practices with long time VM snaps for huge VMFS stores?
From Josh Atwell:
I would support this concept as well. The key to this is having them dictate the culture of the service they provide. I have yet to see a reasonable use case for long term snapshots of any kind.
Even in that scenario it was less than 5 weeks.
I'd suggest simply have the team push back and have all requests for longer snapshots defend a business case for allowing that feature. (risk mitigation, long term roll back, etc.) If a business case is justified and approved by upper management (based on legitimate analysis of impact vs gain) then this could be implemented in an isolated environment as part of their service. Dedicating specific datastores/luns where long term snapshots are allowed. These would be thick provisioned to minimize performance impact during snapshot maintenance tasks, etc.
Just because you can, doesn't mean you should. I ran into this a lot while at Cisco working behind their portal (CITEIS). Here's how I approached feature requests.
- Identify the business objectives and benefits of the portal/feature (need business justification for dangerous activities in portal)
- Identify design that is operationally sustainable that meets those objectives. What impact to non-standard capabilities have on ability to recover from failure, impact of environment, impact to neighbors, etc.
- Implement automation to maintain design and prevent people from "going rogue" or trying to go around the portal process.
In the end if the behavior might impact SLAs of other tenants those customers were not given the full experience. They would get isolated to a non-standard offering which extended deployment cycles and costs. Limited portal capabilities. That would typically force the app owners to think more critically about whether they really need an offering such as long term snapshots or not. 98/100 they decide they can live within the stricter constraints and you never hear from them about it again.