ITSM Uncovered

Life and times in ITSM

  • The Complexity of Problems

    • 4 Mar 2010
    • 0 Responses
    •  views
    • ITIL Incident Management Knowledge Management Problem Management Service Desk
    • Edit
    • Delete
    • Tags
    • Autopost

    According to ITIL, Problem Management seems pretty clear cut; determine what the root cause is and work towards resolving it and removing it as a potential cause for Incidents going forward. This however is not always as clear cut as it may appear. With Problems that are generated from Incidents, you generally have a root cause identified or are given a playing field to begin in once the service has been restored via fixing the known root cause or implementing a work around.

    The reality is, there is not always going to be a clear cut work around or a definitive root cause determined. Take performance issues for example. There is clearly something impacting the service, tickets are generated from the user community, but when being analyzed by Problem Management, there is no single area that may be determined as the culprit. In cases like this, every layer needs to be analyzed, poked, prodded and explored to determine where the root cause may be hiding.

    Today's technology environment is complex. The amount of dependencies and integration points are not always as clearly mapped out as they should be. What may appear, from the end user's perspective as an application level of slowness, may actually be the result of something either in the infrastructure of the network itself or other services consuming bandwidth on the network. Being able to track down and clearly identify the source is a daunting task.

    This raises the question of how many layers and types of Problem Management are there and how are these documented with regard to types of Service the Problem Management function provide? In my own daily practice, I have been able to identify three layers of Problem Management that would need some form of service model built around them, documented and associated with service level agreements. They are:

    Proactive Problem Management

    • Service Desk Problem Management
    • Escalation/Performance Problem Management


    Reactive Problem Management

    • Incident generated Problem Management


    Regardless of the model implemented in an organization, there is no possible way to focus on all of the above from a single role or function of Problem Management. Each area has its different level of process and practice.

    I would argue that the easiest of the three would be the Problem Management response to Major Incidents. In these cases, root cause is almost always identified during the investigation & diagnosis phase of the response to the Incident. Providing Problem Management with the clear time-line of events and actions performed in the resolution and any other details leading to root cause is generally performed. A caveat here is that it is highly dependent on whether or not your Incident Management process is structured in a way to capture and track all life-cycle information during the Incident.

    Conversely the most level of Problem Management would be that of incident tickets generated as the result of escalations or performance related incidents through the Service Desk. These waters are often muddied because the are being generated from the user community which can be very skewed by perception. What may be an issue for one user may be acceptable to another. Tracking down the reality is a very difficult process. Combine that with not having a clear direction to travel to find the root cause can lead individuals down many different paths and throw up distractions away from the original Problem. Sort of what I refer to as the "Mall Effect" where you intend on going in for one thing and end up wasting time going in and out of every other store eventually forgetting why you went to the Mall in the first place.

    In terms of developing a Problem Management program and how ITIL helps describing the process, it is very focused on Problem Management as it applies to a Service Desk environment. Known Error databases are housed and maintained, for the most part, at this layer of an organization. That is, if you have one at all. Problem Management outside of the Service Desk, can quickly become more Project Management like. That is where some sort of Service model or description should be developed up-front and presented to leadership to help them understand that Problem Management is not a single role, function or process, but should be approached in the various levels it can be utilized in. Without that, description of services, it is easy for many to assume that everything is a Problem that needs the attention of Problem Management. Not structuring the Problem Management organization with a tiered model is a good way to set the program up for failure from day one.

    Would be interesting to hear how Problem Management is approached in other organizations. Sharing that sort of information would be of immense value in the development of any Problem Management program for those just starting out.

  • Events, Incidents & Problems

    • 7 Oct 2009
    • 0 Responses
    •  views
    • Event Management Incident Management Problem Management
    • Edit
    • Delete
    • Tags
    • Autopost

    There seems to be some confusion with these three words as concepts of ITIL and in some cases everyday life. This is more about how they apply when things go south in Technology. Often Event, Incident and Problem are used in place of one another when talking about something that breaks. It can get confusing so I thought I would take a moment to help put my spin on how these three words break down.

    Event - The Big Bang

    An even is just that, an event. O.k., that didn't work very well, let me try that again. Think of an event as a point in time when something occurs. Power fails, a car crashes, a tornado touches down or an earthquake strikes. Each one of these is an example of an Event, that moment when something occurs. It can be confusing because in Technology, something can crash or fail and classified as an event, but often, that is a symptom of the event, which occurred. Take a Blue Screen for example. You might think that the Blue Screen of Death is the machine just giving up and quitting. The reality is that in most cases the Blue Screen is a symptom of the event, which occurred. This could be a rogue application, bad memory or something that made the Operating System crash. So determining the Event in Technology can be at times somewhat subjective.

    Event Management is the practice of trying to detect a potential event by putting some process in place that can detect an event as soon as it occurs. In most cases that is done through monitoring tools or automation. The goal is to get on top of these things well before they cause impact or loss of service to the customer. Take the Blue Screen example above. If it was an issue with an application taking up too much memory due to bad code, monitoring could detect that ahead of time and trigger an alarm to have someone respond to the memory issue before resulting in the operating system Blue Screen. If not, then you could have monitoring on the machine itself that could detect when the box becomes unresponsive and trigger an alert. So an Event, and Event Management is all about living life for the moment something could or does go wrong.

    Incident - Whatever it takes

    So, an Event occurs and things break. This is where you have to start thinking about life as the user. Someone somewhere was doing something that is now impacted and not working anymore. Everything that person was depending on is referred to as the Service being provided. It has been my experience that the user or consumer of that service does not give two shakes what makes things go, but that they cannot go any more.

    For the most part, this applies to almost everything in our lives. If my TV channels go out, do I really care what it took Verizon to get it to me? Do I care about their cables, switches, fiber, blah blah...? NO, I just want my damn TV back on! So what do you do? You call their support desk just to ask one simple question: "When will it be back on?" That is all you care about as the customer. Not what broke, who is fixing it or how it broke, but "How long will it take to fix it!" That is the core point of an Incident. Everything that happens during that time, when what you as the customer wants is not available, and when it actually is back up and running.

    That whole process falls under the world of Incident Management. The ability to manage all of the moving parts that are needed to get service restored to the customer. It is the very human element of life in Technology. The one that is responsible for command and control, providing updates to management and the customers while coordinating all of the groups working on the problem. So an Incident could be summed up as the actual lifecycle of a failure. Whatever it takes, just get the service back up and we will figure out the rest later.

    Problem - Workarounds are not forever

    Where it could be said that an Incident is the reactive response to an event, Problems are generally proactive, identifying issues that are known to exist, have the potential to wrong, but are managed through known workarounds.

    An easy analogy of a Problem would be following a recipe that calls for sugar, knowing you don't have any and using honey instead. The dish may come out o.k., but is not the correct way it is done. That is your workaround. The dish still comes out, and the customers may be satisfied, but it is not as good as it should be with sugar. Problem Management is about realizing that you have made several dishes without sugar using honey and making plans to go to the store to buy some sugar. When you buy it, the problem goes away.

    For the most part, Incidents feed Problems. If the goal of Incident Management is to restore service as fast as possible and by any means necessary, including workarounds, then Problem Management is responsible for reviewing those workarounds and replacing them with permanent solutions so that the incident never occurs again.

    The proactive nature of Problem Management does not exist solely for Incidents; it can also apply to Event Management. Think of all of the alarms or alerts that may be occurring in a single day. Problem Management could also collect and analyze that data looking for trends where there is a risk of failure. A single alarm may seem innocent enough; ten or twenty in a week may point to a larger 'problem'. HA! That is why it is Problem Management.

    So there you have it, my interpretation of Event, Incident and Problem and why they need to be understood and applied as each being their own individual process. The beauty is, this is as it is applied in IT, but can also be applied in many different areas in life.

  • About

    A collaborate site brining the real-world of IT Service Management and all of the struggles that come along with the territory to anyone who is interested in understanding more about the practical implementation of industry frameworks and people management in Technology.

    8227 Views
  • Archive

    • 2011 (11)
      • September (1)
      • August (3)
      • July (1)
      • June (2)
      • May (1)
      • March (2)
      • February (1)
    • 2010 (16)
      • December (2)
      • August (1)
      • July (1)
      • June (1)
      • May (3)
      • April (1)
      • March (2)
      • February (5)
    • 2009 (13)
      • December (1)
      • October (1)
      • July (1)
      • June (6)
      • May (4)
    • 2007 (1)
      • March (1)

    Get Updates

    Subscribe via RSS
    TwitterFacebookLinkedIn