Which practice are typically involved in the implementation of a problem resolution?

Provide a permanent solution to recurring issues

Analyze root cause and fix known problems

Introduction

Businesses spend huge amount in firefighting activities and it is crucial to resolve these issues as fast as possible because it directly impacts productivity. ITIL is a framework that includes a set of best practices for service support and delivery. Problem Management is one such ITIL process to prevent incidents from occurring. Businesses often confuse this with Incident management process due to their similarities and many organizations do not have Problem Management process. Incident management deals with resolving issues as soon as possible and restoring services back to normalcy whereas the primary goal of Problem Management is to provide permanent resolution and prevent these incidents from occurring with the help of Change Management process.

It is fundamental to understand the differences between these two before implementing any of these processes. Problem Management helps businesses in cost reduction by identifying and preventing critical incidents. This means that there is no service interruption and therefore no productivity loss. While striving for service excellence, it is inevitable that businesses must deliver seamless support and offer extraordinary service to their users. Problem Management is a part of ITIL service operations lifecycle.Problem Management is closely aligned with other ITIL modules such as Change Management, Release Management in order to plan and deploy a permanent fix to the recurring incident. Most organizations do not understand the importance of Problem Management when they implement ITIL. But it is significant to understand the business value and benefits of this process. 

In this Problem Management guide, let us look at a detailed study of objective, scope, process flow, techniques, benefits, feature checklist and KPIs associated with Problem Management process along with suitable examples.

What is ITIL Problem management

Problem Management is an IT Service Management (ITSM) process to prevent problems and incidents from occurring and resolve known problems with a permanent solution. Recurring incidents give rise to a Problem. The objective of Problem Management is to diagnose the root cause of repeated incidents. Root Cause Analysis (RCA) is an important step during Problem Management process. Incident Management aims at restoring the services as fast as possible and if the same incident occurs frequently that have higher impact, then it is moved to Problem Management team to analyse the root cause and find a solution. Problem Management either provides a workaround for the problem or provides a permanent solution.

Problem Management uses a common database to track problems. It starts with problem diagnosis and try to provide a workaround or a permanent solution. A known error database (KEDB) is maintained for open problems. KEDB is used to track known issues and it involves changes to Configuration Items (CIs). Problem Management and Configuration Management talk to each other in sharing CI related details. Whenever there is a problem reported, it is vital to check CI involved and update the CI if needed in order to resolve the issue permanently. Information consistency across these modules is important to faster resolution of Incidents, problems and also to enable timely deployment. To remain competitive, businesses must have speed to market and agility.

Objectives

Uninterrupted service is a dream come true for any service desk. In reality, issues do arise and it is the responsibility of service desk to mitigate the impact and respond as fast as possible. However, end users expectations have increased and they demand easily accessible service desk touchpoints. The primary objective of Problem Management is to identify and troubleshoot repeating incidents by finding the root cause. Its aim is to proactively eliminate problems from occurring and also find out a workaround or a permanent solution. Problem Management reduces the number of incidents by being proactive. It also reduces the long term cost associated with firefighting activities and service downtime. End user satisfaction improves eventually and realize real business and customer value.

  • Identify the root cause of repeated incidents

  • Provide a workaround in short term to known problems

  • Provide a permanent resolution to frequently occurring incidents

  • Deliver proactive service support

Definitions
  • Problem – One or more repeated incidents with an unknown cause. Problem is the root cause of one or more incidents

  • Incident - Unplanned interruption or service disruption that affects normalcy and quality of service. Incident is the effect and Problem is the cause for incidents

  • Known error - A problem with known root cause but no permanent solution. A workaround is provided for a known error

  • Root cause - Cause of a problem and root cause analysis (RCA) is a method to identify the actual root cause. Eliminate the root cause permanently

  • Workaround - Short term, temporary solution to a known error

  • KEDB - Known Error Database is a common repository to maintain all known errors. KEDB is checked whenever incidents occur frequently.

Problem Management in ITIL Service lifecycle

Problem Management belongs to the ITIL service operation. It interacts with number of other processes in ITIL service lifecycle. Within ITIL service operation, it closely interacts Incident Management to address repeated incidents and prevent major incidents from occurring. When it comes to service design, problem history is crucial to design Availability Management. Knowledge Management that belongs to service transition is helpful to record known errors and their workarounds as knowledge base articles. While performing RCA, Problem Management interfaces with Knowledge Management process to look out for potential solution that is already available. Finally, Proactive Problem Management does Continual Service Improvement to improve the service quality.

Problem Management is crucial at every stage of ITIL service lifecycle. Therefore, it is a costly mistake to ignore this process while setting up ITIL process at your organization. While choosing a service desk solution, ensure that the solution supports all features needed to perform Problem Management process.

Problem Management Process flow

ITIL Problem Management follows a sequence of steps to identify, diagnose and resolve problems. There is a predefined framework to execute Problem Management. This process flow helps organizations to do Problem Management in the right way without confusing with Incident Management. The scope of the process flow are as follows

  • Problem detection

  • Problem logging

  • Investigation and diagnosis

  • KEDB

  • Resolution

Problem detection

The first step is to detect the problem and this can be done in a variety of ways. Tier I team escalates incidents that are unable to resolve. A problem can also be recorded by reviewing the Incident report. When one or more incidents occur with an unknown cause, then a problem record is created. In certain cases, a reported incident is clearly associated to a known problem. If the problem record does not exist, then create a new problem record and link related incidents. Problem detection saves a lot of resources by identifying the problem at the right time so that diagnosis gets easier. The symptoms of a problem include

  • Escalation from Level I team of not being able to resolve the incidents

  • Frequently repeating incidents with similar conditions

  • Incidents reported by multiple people across organization

  • Proactive identification of problem based on patterns and alerts from monitoring tool

Problem logging

Every detected problem has to be logged in the problem record for tracking purpose. It is vital to capture problem details such as problem type, description, associated incidents, affected CIs from CMDB, category, user information, status, resolution, closure. This information is vital to tag known errors and manage them in a database. Every problem record has two attributes i.e. impact and urgency. Impact refers to the number of users and CIs affected due to this problem. Urgency refers to how quickly the resolution is needed. Depending on these two factors, Service Level Agreement (SLA) is set which decides the due by date for problem resolution. This information is crucial for Problem Management team to perform root cause analysis. Service desk ticketing system enables Problem logging by capturing all relevant details using a form template. Generating problem reports using this data becomes easier when there is a complete database.

Investigation & Diagnosis

Prioritization and categorization of problem records help in picking the problem record for investigation. During investigation, stakeholders discuss about possible root cause. Problem diagnosis is done once RCA is completed. RCA is carried out using various Problem Management techniques that are available. Investigation involves cross team collaboration and diagnosis is performed by Problem Research team. While investigating a problem record, it is recommended to search in KEDB initially to find out whether it’s a known problem.

KEDB

Post the diagnosis, problem record could be added to the Known error database (KEDB) or a permanent solution is delivered to close the record. Investigation and diagnosis may result in a workaround to solve the issue temporarily until a permanent resolution is found. Until then, services are restored with the help of a workaround. As soon as a workaround is found out, it is added to the KEDB. It is important to maintain the KEDB upto date. Whenever any incident or problem arises in future, service desk agent refers this database first to check for possible workaround.

Resolution

Problem resolution involves other ITIL modules such as Change Management and Release Management. In order to fix the problem permanently, a new change has to be raised. Change Management handles evaluation, planning and execution of changes. Problem Management team raises the request and submits Request for Change (RFC). Change team evaluates the impact and planning is carried out. A suitable Change Management process is used such as standard, normal or emergency type. Release Management is responsible for actual deployment of approved changes. This involves packaging the change and testing in sandbox environment before it is rolled out to the production environment. It is necessary to document the resolution provided to the user and the Problem record is associated to the respective Change and Release records. Closure can be handled through automation

Problem management Techniques

There are different Problem Management techniques available. Let us discuss some of the popular techniques that can be implemented easily.

Brainstorming

Discussing the problem statement and possible causes with key stakeholders. This involves group discussion and encourages full house partIcipation.

  • Round robin discussion that involves all members

  • Generates high volume of ideas in a shorter time span

  • Faster method and produces diverse set of ideas

Kepner Tregoe Problem analysis

A logical approach to problem-solving that includes with problem definition and elaboration. Possible causes are vetted, then tested and finally the true cause is identified.This is a systematic four phase Root Cause Analysis (RCA) for complex problem analysis. Kepner Tregoe (KT) is applicable for both proactive and reactive problem management. It involves problem analysis as well as potential problem analysis.

  • Situation Appraisal

  • Problem analysis

  • Decision analysis

  • Potential problem analysis

Possible Causes   Evidence   Result  
Memory issue
 
Memory leakage
 
Cause
 
Server speed issue
 
Log files
 
Cause
 
Data retrieval Issue
 
Configuration issue
 
Not a cause

Cause and effect analysis

Cause Effect analysis describes relationships between a problem and its possible causes. This method is also known as Ishikawa or fishbone diagram that analyses primary and secondary causes of a problem. Causes have various categories such as people, product, process and partners. For example: Network outage might have causes such as router malfunction, configuration error, natural disaster etc. This method is used for reactive problem management. Therefore, it is important to define the problem statement precisely.

  • List down all possible causes for an effect / situation

  • Suitable for complex problem analysis

  • Includes many possible causes and contributing factors

  • Discuss action items to improve the process

5 Whys

5 why strategy is a simple technique to find out the root cause by asking subsequent “why” questions. It is one of the six sigma techniques to identify the actual root cause of a problem and to take appropriate countermeasures to prevent from occuring in the future. It understands the relationships between various root causes. However, it is significant to frame the questions properly to derive at the actual the root cause. Asking why question five times is just a rule of thumb and it varies depending on the problem complexity.

Proactive vs Reactive Problem Management

Reactive Problem Management

Which practice are typically involved in the implementation of a problem resolution?

Reactive Problem Management reacts to recurring incidents by analysing the root cause and providing a long term fix. It is crucial to identify these repeating incidents as problems. Incident Management aims at restoring the services as fast as possible and therefore, often miss out on the underlying cause of incidents. Incident Management team transfers such incidents to Problem Management team for a detailed research and analysis. This handover is crucial and timing is more important in order to maintain service integrity.

Incident Management team should pass on information such as incident category, affected CIs, criticality and impact. Reactive Problem Management process consumes these information and does a detailed RCA, submits RFC and updates the problem record in KEDB. Reactive Problem Management starts with checking incident patterns and it includes reviewing past incidents in the service desk.

  • Problem control – Happens during investigation phase as discussed above. This deals with root cause analysis and identifying the actual cause of the problem. Converts problems to known errors.

  • Error control – Happens during resolution phase. This involves limiting known errors from KEDB. It finds permanent solutions for available known errors.

Proactive Problem Management

Which practice are typically involved in the implementation of a problem resolution?

Proactive Problem Management acts as a gatekeeper in continuously identifying potential issues and avoiding them. It does not wait for incidents to occur and aims to prevent incidents/problems from occurring in the future. This process is a preventive technique that involves big data and trend analysis. Patterns are identified from historical incident and problem data and potential issues are avoided. This requires past incident data analysis, major events, asset health check and situational appraisal. Kepner Tregoe analysis is an example of proactive Problem Management technique that deals with data analysis. Examples include maintenance activities, periodic audit.

  • Reduces firefighting activities

  • Prevents major IT failures and thus acts as a gatekeeper

  • Improves efficiency and maintains productivity

Inter relationships with other ITIL modules

Incident Management

Problem management starts once Incident management is completed. A problem record can be created either from one or more incidents or on its own. It deals with analysis of recurring incidents and finding their root cause. Incident management shares information such as incident description, user impacted, asset impacted, criticality. Problem Management uses these information to identify whether it is a known error or not. Therefore, Incident Management acts as a prerequisite to Problem Management in most cases.

Which practice are typically involved in the implementation of a problem resolution?

Change Management

If Problem Management is unable to find a permanent solution, then it is followed by Change Management to execute new changes. Problem Management RCA is crucial for Change Management to understand the associated risk and urgency. Change Management process finds a permanent fix by rolling out new changes. Problem Management simplifies change evaluation phase by providing a detailed RCA. Change Management process decides the change schedule depending on problem impact and criticality. Change advisory board (CAB)

involves relevant stakeholders from Problem research team to assess the planned change. Known errors or Known problems result in a Request for Change (RFC). Relevant problems are associated to the change record for better execution.

Which practice are typically involved in the implementation of a problem resolution?

Configuration Management

Recurring incidents demand asset health check in order to find out the cause. While Problem Management owns root cause analysis, it is essential to work closely with Configuration Management team to understand asset details, asset owner and its interdependencies with other assets, impact and vendor related information. Problem research team with the help of these details suggests the next steps i.e. to execute a new change in the configuration item, CI or provide a suitable workaround. These two modules are closely connected to each other and Problem analysis phase revolves around Configuration Items (CIs) in order to minimize the impact.

Knowledge Management

Problem Management leverages Knowledge Management by accessing the central repository and solution database. Knowledge base articles are fundamental to trend analysis. For both proactive and reactive Problem Management, knowledge base articles help in speedy resolution. Relevant knowledge articles are associated to problem record. Known error database along with workarounds are stored in knowledge base as well. KEDB is a subset of broader Knowledge Management system. After a permanent solution is found out, it is stored in Knowledge Management for future reference.

DOs

  • Learn from past historical incidents. Analyze patterns and eliminate major incidents with data analysis. This saves a lot of time and resource.

  • Integrate Problem Management with other ITIL modules for information sync and consistency. Associations across Incident, Problem and Change records help in easier reference.

  • Assign a dedicated Problem Manager with clear role and responsibilities to execute Problem Management process as per ITIL standards. Problem Manager acts as a liaison between Incident Manager and Change Manager.

  • Plan an effective communication strategy across Change Management, Incident Management and Configuration Management. As soon as a workaround is found out, it is essential to communicate this to related incident owners. This in turn gets communicated to affected end users. In order to be effective, leverage automation capabilities available in the service desk tool.

  • Understand proactive as well as reactive approach to Problem Management. Both are mandatory and useful in certain scenarios. But it is important to understand the differences between two approaches and the process flow.

  • Understand that Problem Management has its own SLA and it is important to resolve before the due date. SLA is decided based on priority. Incident priority is transferred to Problem priority.

  • Learn various Problem Management techniques to find out the actual root cause.

DON’Ts
  • Don’t think Problem Management similar to Incident Management. Both ITIL processes work hand in hand but they are entirely different. However, Problem Management team learn from incident data and past records.

  • Don’t reinvent the wheel. The first step is to check the Known Error Database (KEDB) which is a repository of known problems along with workarounds. This is integral to Problem Management.

  • Don’t forget to document the resolution in detail. Problem resolution is used by multiple teams such as Incident Management to communicate to end users. Therefore, be elaborate in documentation.

  • Don’t ignore any step in Problem Management process flow. Follow every step as described above.

Problem Management Key Performance Indicators (KPIs)

Problem Management leverages Knowledge Management by accessing the central repository and solution database. Knowledge base articles are fundamental to trend analysis. For both proactive and reactive Problem Management, knowledge base articles help in speedy resolution. Relevant knowledge articles are associated to problem record. Known error database along with workarounds are stored in knowledge base as well. KEDB is a subset of broader Knowledge Management system. After a permanent solution is found out, it is stored in Knowledge Management for future reference.

  • No. of problem records reported

  • Average resolution time

  • Percentage of problems resolved within SLA

  • Total no. of known errors

  • Problem backlog - No. of problems unresolved

  • Total no. of Incidents associated to problems

  • Percentage of problems with identified root cause

  • Percentage of problems with a workaround

Problem Manager roles and responsibilities

Problem Manager role does not exist in many organizations but it is fundamental for companies to realize the importance of this ITIL methodology. A Problem Manager role acts as a middleman between Incident and Change Management.

  • Responsible for Problem Management process

  • Lifecycle management of problems

  • Maintains the quality and integrity of Problem

  • Acts as a liaison across different teams such as Incident Management, Change Management

  • Defines and maintains the process flow

  • Continuous review and improvement of Problem Management process

  • Coordinates between various stakeholders to identify the root cause of a problem and find a workaround or solution

  • Prevents incidents from occurring Responsible for production and maintenance of KEDB

  • Responsible for production and maintenance of KEDB

  • Ensures that the right resources are available to investigate, identify root cause of a problem

  • Trend analysis of past historical incident data

  • Ensures problems are resolved within SLA

  • Problem Manager logs RFC when necessary

  • Periodical reports on performance of Problem Management team and cost benefit analysis of RCA

Feature checklist

  • Create, modify and delete problem records
  • Search problem records
  • Filter problems based on created date, assigned agent, requester, status, priority and category
  • Ability to mark a problem as a known error
  • Create multiple dashboards to store relevant problem records
  • Ability to add a detailed root cause and attach relevant files
  • Placeholder to add impact and symptoms
  • Ability to add a solution - permanent or workaround
  • Integrated knowledge management module within Problem Management solution
  • Ability to add or remove solution articles from Knowledge base within the problem record
  • Ability to add or remove the right Configuration Items (CI) to problem record
  • Ability to assign tasks to other people in the same team or other team
  • Automated email notifications based on events
  • Associate the related incidents for better reference
  • SLA information and due by date visibility
  • Ability to maintain and search in KEDB
  • Ability to associate a change record to this problem record
  • Unique Problem identification number for future reference
  • Ability to export problem records
  • Reporting and analytics based on problem data

Benefits

Having discussed the various aspects of Problem Management, it is necessary to highlight the business benefits of Problem Management.

  • Improved service availability - Proactive Problem Management ensures uninterrupted service and avoids major incidents

  • Consistent service quality - A high quality service is essential for service excellence

  • Reduced costs - Major incidents are avoided and subsequent costs are saved

  • Improved customer satisfaction - Problem Management provides a permanent solution to recurring incidents that improves end user satisfaction

  • Improved overall productivity - Finding RCA and fixing an issue permanently ensure seamless business operation

Other ITSM Resources

Sorry, our deep-dive didn’t help. Please try a different search term.

Which practice is responsible for moving components to live environments?

The deployment management practice is used to move new or changed hardware, software, documentation, processes, or any other component to a live environment.

Which practice ensures that service actions that are a normal part of service delivery are effectively handled?

The purpose of the service request management practice is to support the agreed quality of a service by handling all pre-defined, user-initiated service requests in an effective and user-friendly manner.

Which is an activity of the problem management practice?

Problem management and other practices Problem management activities aim to identify, assess, and control risks in any of the four dimensions of service management. Therefore, it may be useful to adopt risk management tools and techniques.

Which practice is the responsibility of everyone in the organization ITIL 4?

Continual Improvement is the responsibility of everyone at all levels of an organization. Continual Improvement is seen in ITIL 4 throughout the Service Value System in the form of: The Improve activity in the Service Value Chain (SVC) The Continual Improvement practice.