Minimize losses by incident prioritization

25.01.2016

Prioritization of activities and their planning is an assumption of success of each organization. Managing this process is responsibility of every manager. Unwritten rules and a good decision making of all the participating subjects will often do. In this article we will consider a more complex situation with more collaborating parties that have to solve tens and hundreds of particular requests. We will focus mainly on incidents (i.e. unplanned interruptions of service deliveries), with which financial losses are always associated, no matter if directly or indirectly. If we set clear rules, till when incidents should be resolved, we can minimizes losses in situations when incidents occur.

Since we have IT processes in mind, we should start with consulting ITIL v3. ITIL deals with IT processes but its principles are valid also for analogous processes outside IT. Ideas from this articles can be inspiring in the same way also for teams or organizations dealing with processes of various requests, orders or contracts.

ITIL v3 - Service Operation considers incident priority as a function of urgency (how fast business needs a solution) and business impact (typically by number of users and their importance). Urgency can be described as a measure of need to solve an issue in case an incident shows tendency to increase in severity (then we need to act more promptly) or, the other way around, its severity decreases (then we can approach solving it later than usually). Impact is then a measure of financial losses, reputation decline or other loss.

The table stated hereinafter shows priority code calculation as a combination of impact and urgency.

         
    Impact    
    High Medium Low
Urgency High 1 2 3
  Medium 2 3 4
  Low 3 4 5

Then you need just to quantify particular priority states, i.e. to assign resolution times as in the table hereinafter.

Priority code Description Resolution time
1 Critical 2 hours
2 High 4 hours
3 Medium 8 hours
4 Low 16 hours
5 Very low 1 week

There is also response time corresponding to the incident priority, which sets till when the solver shall confirm accepting incident for resolution. However, we will focus on the resolution time and calculation of the target time when the incident should be closed. This target time depends on starting time, time for resolution and when when the clock runs. Let`s go into these three factors in detail. They are not trivial questions and we need do the right assessment suiting best your organization. The thing is that we should be able to play with the rules that we will set and therefore, these rules should correspond to organization capabilities. Same like any other goals also goals in the incident management process should be realistic or attainable (see concept of SMART goals).

  1. Starting time can be considered as a time when incident occured, i.e. time of service outage, configuration item failure etc. However, it is not easy to ensure a prompt identification of such an issue in all the cases. Therefore, we rather use time of incident recording - when the incident was reported.
  2. Resolution time in the above stated example depends on urgency and impact. Particularly in small organizations we may decide to work just with one factor (business impact) because real differences in urgency may occur only seldomly. Nevertheless, it will be still necessary to define according to what impact should be determined. Some organizations distinguish quantity of affected users (individual, team, branch, the whole organization). However, there is big difference between particular services. Outage of a system for attendance recording or user training probably does not have such an impact like outage of a system for contract arrangements when it is quite easy to determine revenues that company is losing every hour. Similarly, an outage in a supporting back office team does not need to mean immediate revenue loss like in case of front office staff doing the business. We should therefore consider business losses or another quantifiable adverse impacts. Resolution times should correspond to resolution team capabilities, number of staff and level of service your company wants to achieve. Higher level of service is usually associated with higher cost.
  3. When the clock runs while solving an incident. Time is running constantly. However, this does not need to be the case from the perspective of effort spent on resolving an incident. Imagine an incident with a low priority is reported Friday afternoon just before the staff shall leave - in the above stated example there is 16 hours to resolve it. User do not work during weekend and also IT team does not keep shifts. However, if nobody starts to solve the incident, the period for its resolution expires during the weekend. How should IT manager decide? When defining the process, we need to state when the clock for incident resolution is running. It should be at times when somebody can actually treat it - solution teams have shift or are on call and IT manager can order overtime work. Both approaches can be of course combined and we can say that for incidents with high prority (code 1 and 2) clock is running constantly and for less important incidents only at working hours. E.g. infrastructure team ensuring proper function of crucial systems like Active Directory must be able to respond immediately (at night and over holidays), team supporting critical application works till late evening hours and partly during weekend and team for less important system can have a usual working time including lunch break.

How to tackle these requirements in practise? First of all, ask thoroughly about capabilities of a system that you consider to implement. Otherwise, you can easily come across things, that your incident management software will not be able to ensure. Apart from an easy definition of various calendars (working hours) for various teams be also careful about ability of the system to change the target resolution time when you realize that incident prority should be increased or decreased during its resolution.

And what ObjectGears is providing in this area? Apart from the above stated prioritization matrix or single factor prioritization (only business impact, not urgency), it is e.g.:

  • marking incidents as VIP (notebook failure of a common user versus notebook of a general manager)
  • easy definition of working hour calendars reflecting regular and irregular exceptions like bank holidays, work on Saturday, extraordinary shifts etc. in order the planned resolution time corresponds to all possible specifics of the given organization
  • single task queue. Solver of an incident often deals also with other request (tasks with deadline agreed with a superior or project manager, catalogue requests with time defined according to the type of request, requests for testing something, bugs reported from tests etc.) Therefore, it is necessary to prioritize not only within incidents but within much broader spectrum of task types. ObjectGears offers various logic for calculation of these times and also reflects types of requests in a single solver queue.
  • notification before approaching target resolution time. An objective of each team working on incidents or another tasks should be to solve them in time, in agreed term. Notifications enable to identify incidents, where the deadline is close and notify assigned solver or manager about need to act.

Detail of regular working time definition