We are developing SLAs between storage and the UNIX group. These are covering standards, availability and performance. My question is one SLA under performance is a service time of <20ms on production disks. Now service time is from kernel back to kernel and can have higher spikes. Does anybody have experience in developing reasonable SLAs. There has to be disclaimers put in, especially for performance.
I would try to keep to SLAs limited to delivery and response (someone return the call) times.
Performance SLAs on the storage layer are tricky business and may cause you more trouble than they are worth. And if your organization is as fanatic about SLA breaches as mine, I would never agree to any SLAs that are not directly tied to a measurable business impact.
For example, having the performance SLAs at the application level (e.g. >10,000 transactions per sec) as opposed to on the SAN level (disk response time < 15 ms) makes more sense. Why? Unless the app is showing performance degragation, who cares in the middle of the night what the latency is on the disk.
I have to agree with Uwe here. Unless you are providing service to an outside entity I think you should be careful what gets included in the SLA.
Absolute worst case (if they insist on defining certain performance characteristics) you should consider setting these up as Service Level Targets (or some people are calling them Service Level Objectives). This way you can show that you are going to attempt to provide that level of performance, but you are not promising it... a breach would be investigated, but does not necesarily become the reason to drive an immediate expensive upgrade.
I really like Uwe's idea of driving this from the perspective of what the business need actually is (as opposed to the nuts and bolts). If they are getting what they need then why dig deeper? Plenty of time for that in troubleshooting why they aren't getting the number of transactions they need (or as another example we have SLTs around how long it will take batch to run or backups to complete).
If performance is a component in SLA calculation, sometimes, it will create situations that's difficult to handle. Perfomance issues need a root cause analysis, the solution and then its implementation which need enough time. In order to meet thae SLA, one cann't do all these steps in a moment. It should be measured as a separate entity in service delivery.