News

5 Practical Tips: Service-Level Agreements (1/5)

25.03.2015

Tip 1: Define clear metrics and head off subsequent disagreements

A service-level agreement always contains many individual definitions relating to metrics, reference periods, measurement methods and techniques, and project-related definitions. The metrics conceivable here are essentially units of time. A particular example might be availability rates which, although not expressed as units of time but as percentages, ultimately contain information relevant for units of time. However, units of time are also the relevant metrics for defining response times, reaction times or meantime to repair. Other conceivable metrics are the number of outages in a given period as well as – although less frequently – the number of man-hours, days, weeks or months required to provide development, testing and/or maintenance or servicing work (particularly for proprietary applications). Where telecommunications services are concerned, conceivable metrics are performance bandwidths, throughput figures or packet loss rates. Contracts with call centre service providers, for instance, set out the number of calls taken per unit of time (inbound or outbound), the maximum permissible time before a call centre agent answers an incoming call, the volume of calls that are instead allowed to go to voicemail, combined with the reaction time for the answering of these collected voicemail messages by a call centre agent. Where settlement services (securities, payment transactions, loans etc.) are concerned, the essential criteria for service-level agreements are the capacities to be managed per unit of time and the reaction time between the occurrence of an event triggering the provision of services and the rendering of these services.

Depending on the specific type of project, service levels might involve several of the aforementioned metrics; where the provision of IT services is concerned, however, availability and response times will almost always be defined.

Example: Availability rates for IT systems

Nowhere else in contractual practice do such frequent and varied errors occur than in the definition of availability rates as an essential service level. This often starts with the failure to precisely specify whether the required availability merely refers to the operation of a server, to the operation of an entire system platform up to the application layer, or to the availability of an application on the server or even at the user’s workstation. It is also often unclear what the parties actually regard as availability, i.e. defining when a system is actually available and, where appropriate, how a partial system outage will affect the determination of availability rates in complex client-server structures. Also, although a particular system may be running, if it is busy with a large transaction so that users cannot start or successfully complete other transactions, then it may be operational but is not available to other users. The agreement must therefore precisely define what the parties understand by “availability” and under what conditions they deem a particular system to be available.

Availability rates are normally expressed as percentages representing the total time that the system, as defined, was actually available in relation to the maximum total time that the defined system should have been available. The fact that the percentages – which are usually only stated in the agreement – merely represent a ratio involving two total times, shows that not only is it necessary to specify the total time that the particular system was actually available as defined in the contract, but also the reference time period in relation to the total time that the system was available, e.g. X% per week, month, quarter or calendar year. In practice, it is not uncommon for the reference time period to be absent from the contract itself and to be indirectly extrapolated from the intervals at which the contractor is required to report on the services provided and the service levels achieved. This is all the more remarkable when one bears in mind the extent to which results may differ depending on the reference period: 98.5% availability over a full week means that a system can fail and thus be unavailable for 2.52 consecutive hours per week and the contractor has nevertheless fully complied with his performance obligations; the same percentage over a year – i.e. 365 days – means that the relevant system can fail and thus be unavailable for 5.47 consecutive days, i.e. 131.4 consecutive hours, and the contractor has nevertheless fully complied with his performance obligations.

Incidentally, the last example in particular shows that defining a percentage in relation to a certain period is not necessarily always sufficient to ensure an adequately precise and, above all, tolerable definition of system availability. Studies have shown that fewer and fewer companies are able to survive the complete short-term (one to two days maximum) failure of essential systems without succumbing to the risk of corporate insolvency. The definition of an availability rate is thus only complete and only makes economic sense if, in addition to simply quoting a percentage, the maximum possible duration of an individual system failure and the maximum possible number of system failures per time unit (day, week, month, quarter or year) are defined. Aside from this economic aspect, the reference time should also be selected in a way that ensures that management of the service provider is still viable. The longer the reference period is, the fewer formal possibilities the client has to intervene and take corrective action.

Example: Call centre response times

Call centre services often represent an important link in the chain between a manufacturer or service provider (e.g. mail-order business) and its customers. What is crucial here is the speed with which the call centre can be contacted; waiting any longer than 30 seconds for a call to be answered is usually deemed unacceptable by the customer who then hangs up in exasperation and can be considered a lost customer. For this reason, in agreements with call centre service providers, the other essential service level that must be agreed in addition to the maximum number of calls to be answered per time unit, is the maximum permissible time between the arrival of an incoming call in the service provider’s telephone system and the answering of the call in person by a call centre agent. Either 100% of incoming calls must be answered within this period, or it can be agreed that a lower percentage will be answered in person, with the remaining calls at least being diverted to voicemail. The calls sent to voicemail must then be responded to by callback or processed in some other way within an agreed period to be defined as a further service level.

Already released: