Incident management and Problem management

 

 

Incident management focuses on restoration of services, that were interrupted or deteriorated in an unplanned way, as quickly as possible with the objective to minimize impact on business. Problem management focuses on analysis of the root cause of the incidents with the objective to eliminate future incidents.

 

 

Below text contains basic principles of these processes in ObjectGears, reflecting best practise described in ITIL v3 - Service Operation.

Incident and Problem management - restore service as soon as possible and follow up on an underlying root cause of incidents.

Incident is a fundamental entity of Incident and Problem managemet processes. It represents an interruption or deterioration of a service. Incident is created at the time when such a situation is identified.

Impact, urgency, priority and target resolution time

Key incident attributies are its Impact and Urgency. Impact represents how much business is affected - from the perspective of number of users or customers (from individuals, over a group to large numbers), financial loss or organization reputation. During implementation of the process, particular levels of impact are described from the viewpoint of the given organization. Urgency represents rate of need to solve the situation from the perspective of tendency to increase incident impact (impact may spread when incident is not solved), not changing impact extent or on the contrary tendency of a spontaneous impact fade out. Priority matrix is defined, that sets Priority for particular combinations of impact and urgency.

Example of an incident management priority matrix:

Impact Urgency Priority
High High 1 - Critical
High Medium 2 - High
High Low 3 - Medium
Medium High 2 - High
Medium Medium 3 - Medium
Medium Low 4 - Low
Low High 3 - Medium
Low Medium 4 - Low
Low Low 5 - Very low

There is a Target response time and Target resolution time of the incident corresponding to each priority. E.g. a critical incident then has to be immediately confirmed for take over for solving and target resolution time is 1 hour.

Priority example:

Code Name Response Resolution
1 Critical Immediately 1 hour
2 High 10 minutes 4 hours
3 Medium 1 hour 8 hours
4 Low 4 hours 24 hours
5 Very low 1 day 1 week

Each incident is at the time of its creation classified from the perspective of impact and urgency and based on these values priority is calculated. Based on affected service and maximal time for solving target resolution time is determined.

Example: If the service is guaranteed only from 8:00 to 17:00 and an incident with priority and time for solving 8 hours is identified at 16:00, the target resolution time is set to the next day 15:00. This time reflects incident seriousness and need for service availability and sets a corresponding target time. Less important incident then has set corresponding target time and staff is not forced to work extraordinary shifts to resolve it.

Incident categorization

Incident is categorized to determine resolution group and for follow-up reporting - identification of areas, where incidents occur most often.

Incident creation and relation to other entities and processes

Incident can be created in three ways:

  • Incident created by IT staff
  • Incident created by IT staff when solving User call. IT Service Desk identifies, that there is an incident based on a User call. (ObjectGears enables a fast incident creation from a User call and its linking by means of a button in the User call record.)
  • Automatic creation by a monitoring system that asses certain situation (e.g. unavailability of a service, server etc.)

Known error

When solving an incident it may be found out that the cause of it is a Known error. IT maintains in ObjectGears records of known errors. Linking incident to a known error enables later on to set impact of known errors on business, number of incidents associated with them, success of their solving etc.

Problem

When solving an incident the solver may find out that the cause of incident is a known Problem. Linking Incident to a Problem enables to set impact of a certain problem, that is not solved yet, on business,  number of incidents or outages associated with it etc. In case that the solver identifies incident cause and associated problem is not yet created, ObjectGears enables a fast creation of Problem from Incident and its linking by means of a button in the Incident record.)

Also problem may be linked to a record of a Known error.

Workaround

Since the objective of Problem solution is to eliminate its root cause, its expected resolution time is by nature longer than in case of incident. When solving a problem solver sometimes identifies a Workaround - a temporary solution that circumvents cause of a Problem. In such a case it is necessary to link Problem to the Workaround. If an Incident occurs later on and it is associated with the given Problem, the solver may solve the Incident by the steps described in the Workaround.

Change

If a need for a Change (see Change management) is identified when solving Incident or Problem, this Change is recorded and Incident or Problem linked to it. This enables to trace back Changes made due to solving a certain Incident or Problem.

Functional and hierarchic escalation

In case it is obvious the incident cannot be resolved by Service desk staff, it is functionally escalated - incident is passed over to the 2nd level solvers (a specialized team according to the area the incident comes to - e.g. Servers, DB Administrators...). Similarly Incident and Problem can be escalated to a 3rd level support - vendor. At this moment a Service request is created based on a contract that is between the customer and the vendor. Since ObjectGears enables an effective linking of particular entities, it is possible to link Service requests also in the Incident detail. Solver, manager or any other staffer can see a complex picture of solving the given incident or problem including performed Changes or work of vendors to which Incident or Problem was escalated.

Hierarchic escalation comes about when it is obvious that an Incident was not/won't be resolved in time, it is necessary to allocate extra resources to solve the incident or plan extraordinary steps. Responsible IT managers are informed within hierarchic escalation.

Knowledge Base

Knowledge from solving Incidents and Problems shall be reflected in Knowledge Base and created or updated articles shall be linked to relevant Configuration items.

Configuration items

It is necessary to link Configuration items to Incidents, Problems, Changes and other entities used in the Incident and Problem management management, which they relate to. ObjectGears then enables displaying Incidents, Problems, Changes and other entities, that were linked to them when Configuration items are displayed. This provides a complex view on a given Configuration item to the user.

Reporting

Process operation has to be evaluated and monitored whether it meets expectations (e.g. meeting the deadlines for incident resolution), how particular solvers perform (e.g. number of closed incidents, number of incidents after deadline) or where (in which areas) are the most incidents and they therefore need the most attention. This is were predefined reports can be utilized. These reports can be easily modified or new reports can be created. 

Examples of screens

Incident detail

Key properties of an incident are name, code, description, date and time of occurence, customer, solver, business impact and urgency, based on which priority is calculated (and time for incident resolution) and target time of the resolution. In the bottom part of the screen we can see tabs with associated changes that where performed within incident solving, service requests to partners (escallation of incident to companies providing 3rd level support), user calls that relate to the incident and history of incident (reporting of the time spent on solving the incident by particular workers). Quite at the bottom we have button NEW PROBLEM, that sets up a problem record with data included in the incident and links the incident to it, in case that the solver identifies a general incident cause that last even if the incident was resolved and service restored.

Incident detail

Incident detail 2

Incident detail 2

It is possible to link other entities to the incident. Known error (an identified error that occurs in the system and is acknowledged by the system vendor as a bug - typically vendor of the system defines version/patch, that solves the given bug), Problem (general root cause remaining also after service restoration) or associated Configuration items that relate to the incident. 

Problem detail

In problem we register similar facts like with incident. We can refer to a Known error or to a Workaround, that enables to avoid the problem show. In the bottom part of the screen we can see associated incidents (frequency of problem occurence), service requests (escallation to partners ensuring 3rd level support) or associated changes performed within problem solving and history of problem solving.

Problem detail

Known error detail

Known error detail

We keep record of name, code and description of a known error. In the bottom part of the screen we can see an overview of incidents and problems that identified the known error as a root cause.

Service request detail

In a service request we can follow the course of solving a task on service partner. We can register service contract, based on which the service request was issued, ID of the task in the partner system and references to incident, problem or task, which it relates to.

Service request detail

Dashboard with reports for Incident and Problem management

Dasboard with reoorts for Incident and Problem management

Part of the screen of the dashboard with evaluation of process for Incident and Problem management. Newly created reports can be inserted into the dashboard. Similarly, new dashboards specialized for particular specific reports can be created and linked each to another.

Maximizing dashboard report

There are several reports displayed on the incident and problem management dashboard. They can be maximized by clicking on an icon in their upper right corner. Report will then be displayed over the whole screen. The report can be closed similarly and the overall dashboard is displayed again.

Maximized report