According to ITIL, Problem Management seems pretty clear cut; determine what the root cause is and work towards resolving it and removing it as a potential cause for Incidents going forward. This however is not always as clear cut as it may appear. With Problems that are generated from Incidents, you generally have a root cause identified or are given a playing field to begin in once the service has been restored via fixing the known root cause or implementing a work around.

The reality is, there is not always going to be a clear cut work around or a definitive root cause determined. Take performance issues for example. There is clearly something impacting the service, tickets are generated from the user community, but when being analyzed by Problem Management, there is no single area that may be determined as the culprit. In cases like this, every layer needs to be analyzed, poked, prodded and explored to determine where the root cause may be hiding.

Today's technology environment is complex. The amount of dependencies and integration points are not always as clearly mapped out as they should be. What may appear, from the end user's perspective as an application level of slowness, may actually be the result of something either in the infrastructure of the network itself or other services consuming bandwidth on the network. Being able to track down and clearly identify the source is a daunting task.

This raises the question of how many layers and types of Problem Management are there and how are these documented with regard to types of Service the Problem Management function provide? In my own daily practice, I have been able to identify three layers of Problem Management that would need some form of service model built around them, documented and associated with service level agreements. They are:

Proactive Problem Management

  • Service Desk Problem Management
  • Escalation/Performance Problem Management


Reactive Problem Management

  • Incident generated Problem Management


Regardless of the model implemented in an organization, there is no possible way to focus on all of the above from a single role or function of Problem Management. Each area has its different level of process and practice.

I would argue that the easiest of the three would be the Problem Management response to Major Incidents. In these cases, root cause is almost always identified during the investigation & diagnosis phase of the response to the Incident. Providing Problem Management with the clear time-line of events and actions performed in the resolution and any other details leading to root cause is generally performed. A caveat here is that it is highly dependent on whether or not your Incident Management process is structured in a way to capture and track all life-cycle information during the Incident.

Conversely the most level of Problem Management would be that of incident tickets generated as the result of escalations or performance related incidents through the Service Desk. These waters are often muddied because the are being generated from the user community which can be very skewed by perception. What may be an issue for one user may be acceptable to another. Tracking down the reality is a very difficult process. Combine that with not having a clear direction to travel to find the root cause can lead individuals down many different paths and throw up distractions away from the original Problem. Sort of what I refer to as the "Mall Effect" where you intend on going in for one thing and end up wasting time going in and out of every other store eventually forgetting why you went to the Mall in the first place.

In terms of developing a Problem Management program and how ITIL helps describing the process, it is very focused on Problem Management as it applies to a Service Desk environment. Known Error databases are housed and maintained, for the most part, at this layer of an organization. That is, if you have one at all. Problem Management outside of the Service Desk, can quickly become more Project Management like. That is where some sort of Service model or description should be developed up-front and presented to leadership to help them understand that Problem Management is not a single role, function or process, but should be approached in the various levels it can be utilized in. Without that, description of services, it is easy for many to assume that everything is a Problem that needs the attention of Problem Management. Not structuring the Problem Management organization with a tiered model is a good way to set the program up for failure from day one.

Would be interesting to hear how Problem Management is approached in other organizations. Sharing that sort of information would be of immense value in the development of any Problem Management program for those just starting out.